Showing papers on "Workflow published in 2011"

PDF

Open Access

Journal Article•DOI•

Data processing and analysis with the autoPROC toolbox

[...]

Clemens Vonrhein, C. Flensburg, P. Keller, Andrew J. Sharff, Oliver S. Smart, W. Paciorek, T.O. Womack, Gérard Bricogne - Show less +4 more

01 Apr 2011-Acta Crystallographica Section D-biological Crystallography

TL;DR: Typical topics and problems encountered during data processing of diffraction experiments are discussed and the tools provided in the autoPROC software are described.

...read moreread less

Abstract: A typical diffraction experiment will generate many images and data sets from different crystals in a very short time. This creates a challenge for the high-throughput operation of modern synchrotron beamlines as well as for the subsequent data processing. Novice users in particular may feel overwhelmed by the tables, plots and numbers that the different data-processing programs and software packages present to them. Here, some of the more common problems that a user has to deal with when processing a set of images that will finally make up a processed data set are shown, concentrating on difficulties that may often show up during the first steps along the path of turning the experiment (i.e. data collection) into a model (i.e. interpreted electron density). Difficulties such as unexpected crystal forms, issues in crystal handling and suboptimal choices of data-collection strategies can often be dealt with, or at least diagnosed, by analysing specific data characteristics during processing. In the end, one wants to distinguish problems over which one has no immediate control once the experiment is finished from problems that can be remedied a posteriori. A new software package, autoPROC, is also presented that combines third-party processing programs with new tools and an automated workflow script that is intended to provide users with both guidance and insight into the offline processing of data affected by the difficulties mentioned above, with particular emphasis on the automated treatment of multi-sweep data sets collected on multi-axis goniostats.

...read moreread less

1,239 citations

Journal Article•DOI•

Enterprise Systems: State-of-the-Art and Future Trends

[...]

Li Da Xu¹•Institutions (1)

Old Dominion University¹

06 Sep 2011-IEEE Transactions on Industrial Informatics

TL;DR: The state of the art in the area of enterprise systems as they relate to industrial informatics is surveyed, highlighting formal methods and systems methods crucial for modeling complex enterprise systems, which poses unique challenges.

...read moreread less

Abstract: Rapid advances in industrial information integration methods have spurred tremendous growth in the use of enterprise systems. Consequently, a variety of techniques have been used for probing enterprise systems. These techniques include business process management, workflow management, Enterprise Application Integration (EAI), Service-Oriented Architecture (SOA), grid computing, and others. Many applications require a combination of these techniques, which is giving rise to the emergence of enterprise systems. Development of the techniques has originated from different disciplines and has the potential to significantly improve the performance of enterprise systems. However, the lack of powerful tools still poses a major hindrance to exploiting the full potential of enterprise systems. In particular, formal methods and systems methods are crucial for modeling complex enterprise systems, which poses unique challenges. In this paper, we briefly survey the state of the art in the area of enterprise systems as they relate to industrial informatics.

...read moreread less

637 citations

Proceedings Article•DOI•

Auto-scaling to minimize cost and meet application deadlines in cloud workflows

[...]

Ming Mao¹, Marty Humphrey¹•Institutions (1)

University of Virginia¹

12 Nov 2011

TL;DR: This paper presents an approach whereby the basic computing elements are virtual machines (VMs) of various sizes/costs, jobs are specified as workflows, users specify performance requirements by assigning (soft) deadlines to jobs, and the goal is to ensure all jobs are finished within their deadlines at minimum financial cost.

...read moreread less

Abstract: A goal in cloud computing is to allocate (and thus pay for) only those cloud resources that are truly needed. To date, cloud practitioners have pursued schedule-based (e.g., time-of-day) and rule-based mechanisms to attempt to automate this matching between computing requirements and computing resources. However, most of these "auto-scaling" mechanisms only support simple resource utilization indicators and do not specifically consider both user performance requirements and budget concerns. In this paper, we present an approach whereby the basic computing elements are virtual machines (VMs) of various sizes/costs, jobs are specified as workflows, users specify performance requirements by assigning (soft) deadlines to jobs, and the goal is to ensure all jobs are finished within their deadlines at minimum financial cost. We accomplish our goal by dynamically allocating/deallocating VMs and scheduling tasks on the most cost-efficient instances. We evaluate our approach in four representative cloud workload patterns and show cost savings from 9.8% to 40.4% compared to other approaches.

...read moreread less

556 citations

Journal Article•DOI•

Soundness of workflow nets: classification, decidability, and analysis

[...]

W.M.P. van der Aalst¹, K.M. van Hee², A.H.M. ter Hofstede¹, Natalia Sidorova², H. M. W. Verbeek², Marc Voorhoeve², Moe Thandar Wynn¹ - Show less +3 more•Institutions (2)

Queensland University of Technology¹, Eindhoven University of Technology²

01 May 2011-Formal Aspects of Computing

TL;DR: It is shown that the eight soundness notions described in the literature are decidable for workflow nets, however, most extensions will make all of these notions undecidable.

...read moreread less

Abstract: Workflow nets, a particular class of Petri nets, have become one of the standard ways to model and analyze workflows. Typically, they are used as an abstraction of the workflow that is used to check the so-called soundness property. This property guarantees the absence of livelocks, deadlocks, and other anomalies that can be detected without domain knowledge. Several authors have proposed alternative notions of soundness and have suggested to use more expressive languages, e.g., models with cancellations or priorities. This paper provides an overview of the different notions of soundness and investigates these in the presence of different extensions of workflow nets. We will show that the eight soundness notions described in the literature are decidable for workflow nets. However, most extensions will make all of these notions undecidable. These new results show the theoretical limits of workflow verification. Moreover, we discuss some of the analysis approaches described in the literature.

...read moreread less

335 citations

Journal Article•DOI•

Data from clinical notes: a perspective on the tension between structure and flexible documentation.

[...]

S. Trent Rosenbloom¹, Joshua C. Denny¹, Hua Xu¹, Nancy M. Lorenzi¹, William W. Stead¹, Kevin B. Johnson¹ - Show less +2 more•Institutions (1)

Vanderbilt University Medical Center¹

01 Mar 2011-Journal of the American Medical Informatics Association

TL;DR: The authors explore the tension between expressivity and structured clinical documentation, review methods for obtaining reusable data from clinical notes, and recommend that healthcare providers be able to choose how to document patient care based on workflow and note content needs.

...read moreread less

326 citations

Journal Article•DOI•

XtalOpt: An Open-Source Evolutionary Algorithm for Crystal Structure Prediction

[...]

David Lonie¹, Eva Zurek¹•Institutions (1)

University at Buffalo¹

01 Feb 2011-Computer Physics Communications

TL;DR: It is demonstrated that hybrid operators, which combine two pure operators, reduce the number of duplicate structures in the search, which allows for better exploration of the potential energy surface of the system in question, while simultaneously zooming in on the most promising regions.

...read moreread less

266 citations

Journal Article•DOI•

Software for systems biology: from tools to integrated platforms

[...]

Samik Ghosh, Yukiko Matsuoka, Yoshiyuki Asai¹, Kun‑Yi Hsin¹, Hiroaki Kitano¹ - Show less +1 more•Institutions (1)

Okinawa Institute of Science and Technology¹

01 Dec 2011-Nature Reviews Genetics

TL;DR: The types of software tools that are required at different stages of systems biology research and the current options that are available for systems biology researchers are described.

...read moreread less

Abstract: Understanding complex biological systems requires extensive support from software tools. Such tools are needed at each step of a systems biology computational workflow, which typically consists of data handling, network inference, deep curation, dynamical simulation and model analysis. In addition, there are now efforts to develop integrated software platforms, so that tools that are used at different stages of the workflow and by different researchers can easily be used together. This Review describes the types of software tools that are required at different stages of systems biology research and the current options that are available for systems biology researchers. We also discuss the challenges and prospects for modelling the effects of genetic changes on physiology and the concept of an integrated platform.

...read moreread less

262 citations

Posted Content•

Human-powered Sorts and Joins

[...]

Adam Marcus¹, Eugene Wu¹, David R. Karger¹, Samuel Madden¹, Robert C. Miller¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

30 Sep 2011-arXiv: Databases

TL;DR: The authors integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers and used humans to compare items for sorting and joining data, two of the most common operations in DBMSs.

...read moreread less

Abstract: Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

...read moreread less

262 citations

Journal Article•DOI•

HCOC: a cost optimization algorithm for workflow scheduling in hybrid clouds

[...]

Luiz F. Bittencourt¹, Edmundo R. M. Madeira¹•Institutions (1)

State University of Campinas¹

03 Aug 2011-Journal of Internet Services and Applications

TL;DR: This paper presents HCOC: The Hybrid Cloud Optimized Cost scheduling algorithm, which decides which resources should be leased from the public cloud and aggregated to the private cloud to provide sufficient processing power to execute a workflow within a given execution time.

...read moreread less

Abstract: Workflows have been used to represent a variety of applications involving high processing and storage demands. As a solution to supply this necessity, the cloud computing paradigm has emerged as an on-demand resources provider. While public clouds charge users in a per-use basis, private clouds are owned by users and can be utilized with no charge. When a public cloud and a private cloud are merged, we have what we call a hybrid cloud. In a hybrid cloud, the user has elasticity provided by public cloud resources that can be aggregated to the private resources pool as necessary. One question faced by the users in such systems is: Which are the best resources to request from a public cloud based on the current demand and on resources costs? In this paper we deal with this problem, presenting HCOC: The Hybrid Cloud Optimized Cost scheduling algorithm. HCOC decides which resources should be leased from the public cloud and aggregated to the private cloud to provide sufficient processing power to execute a workflow within a given execution time. We present extensive experimental and simulation results which show that HCOC can reduce costs while achieving the established desired execution time.

...read moreread less

261 citations

Journal Article•DOI•

Human-powered sorts and joins

[...]

Adam Marcus¹, Eugene Wu¹, David R. Karger¹, Samuel Madden¹, Robert C. Miller¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Sep 2011

TL;DR: This paper describes how MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task, and proposes a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them.

...read moreread less

259 citations

Journal Article•DOI•

Cost optimized provisioning of elastic resources for application workflows

[...]

Eun-Kyu Byun¹, Yang-Suk Kee², Jin-Soo Kim³, Seungryoul Maeng¹•Institutions (3)

KAIST¹, Oracle Corporation², Sungkyunkwan University³

01 Oct 2011-Future Generation Computer Systems

TL;DR: This paper suggests an architecture for the automatic execution of large-scale workflow-based applications on dynamically and elastically provisioned computing resources using the core algorithm named PBTS (Partitioned Balanced Time Scheduling), which estimates the minimum number of computing hosts required to execute a workflow within a user-specified finish time.

...read moreread less

Book•

A Knowledge-Based Approach to Handling Exceptions in Workflow Systems

[...]

Mark Klein¹, Chrysanthos Dellarocas¹•Institutions (1)

Massachusetts Institute of Technology¹

28 Aug 2011

TL;DR: This approach is based on exploiting ageneric and reusable body of knowledge concerning what kinds of exceptions can occur in collaborative work processes and how these exceptions can handled.

...read moreread less

Abstract: This paper describes a novel knowledge-based approach for helping workflow process designers and participants better manage the exceptions (deviations from an ideal collaborative work process caused by errors, failures, resource or requirements changes etc.) that can occur during the enactment of a workflow. This approach is based on exploiting a generic and reusable body of knowledge concerning what kinds of exceptions can occur in collaborative work processes, and how these exceptions can handled (detected, diagnosed and resolved). This work builds upon previous efforts from the MIT Process Handbook project and from research on conflict management in collaborative design.

...read moreread less

Proceedings Article•DOI•

Scheduling Scientific Workflows Elastically for Cloud Computing

[...]

Cui Lin¹, Shiyong Lu•Institutions (1)

California State University, Fresno¹

04 Jul 2011

TL;DR: The preliminary experiments show that SHEFT not only outperforms several representative workflow scheduling algorithms in optimizing workflow execution time, but also enables resources to scale elastically at runtime.

...read moreread less

Abstract: Most existing workflow scheduling algorithms only consider a computing environment in which the number of compute resources is bounded. Compute resources in such an environment usually cannot be provisioned or released on demand of the size of a workflow, and these resources are not released to the environment until an execution of the workflow completes. To address the problem, we firstly formalize a model of a Cloud environment and a workflow graph representation for such an environment. Then, we propose the SHEFT workflow scheduling algorithm to schedule a workflow elastically on a Cloud computing environment. Our preliminary experiments show that SHEFT not only outperforms several representative workflow scheduling algorithms in optimizing workflow execution time, but also enables resources to scale elastically at runtime.

...read moreread less

Journal Article•DOI•

Data-processing strategies for metabolomics studies

[...]

Margriet M. W. B. Hendriks, F.A. van Eeuwijk¹, Renger H. Jellema², Johan A. Westerhuis³, Theo H. Reijmers, Huub C. J. Hoefsloot³, Age K. Smilde³ - Show less +3 more•Institutions (3)

Wageningen University and Research Centre¹, DSM², University of Amsterdam³

01 Nov 2011-Trends in Analytical Chemistry

TL;DR: This work covers the entire workflow of metabolomics studies, starting from experimental design and sample-size determination to tools that can aid in biological interpretation and discusses the problems that have to be dealt with in data analysis in metabolomics.

...read moreread less

Abstract: Metabolomics studies aim at a better understanding of biochemical processes by studying relations between metabolites and between metabolites and other types of information (e.g., sensory and phenotypic features). The objectives of these studies are diverse, but the types of data generated and the methods for extracting information from the data and analysing the data are similar. Besides instrumental analysis tools, various data-analysis tools are needed to extract this relevant information. The entire data-processing workflow is complex and has many steps. For a comprehensive overview, we cover the entire workflow of metabolomics studies, starting from experimental design and sample-size determination to tools that can aid in biological interpretation. We include illustrative examples and discuss the problems that have to be dealt with in data analysis in metabolomics. We also discuss where the challenges are for developing new methods and tailor-made quantitative strategies.

...read moreread less

Journal Article•DOI•

Wings: Intelligent Workflow-Based Design of Computational Experiments

[...]

Yolanda Gil¹, Varun Ratnakar¹, Jihie Kim¹, Joshua Moody¹, Ewa Deelman¹, Pedro A. González-Calero², Paul Groth³ - Show less +3 more•Institutions (3)

University of Southern California¹, Complutense University of Madrid², University of Amsterdam³

01 Jan 2011-IEEE Intelligent Systems

TL;DR: Describes the Wings intelligent workflow system that assists scientists with designing computational experiments by automatically tracking constraints and ruling out invalid designs, letting scientists focus on their experiments and goals.

...read moreread less

Abstract: Describes the Wings intelligent workflow system that assists scientists with designing computational experiments by automatically tracking constraints and ruling out invalid designs, letting scientists focus on their experiments and goals.

...read moreread less

Proceedings Article•

Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: towards a fully automated workflow

[...]

Xavier Dutreilh, Sergey Kirgizov, Olga Melekhova, Jacques Malenfant, Nicolas Rivierre, Isis Truck - Show less +2 more

22 May 2011

TL;DR: This paper proposes to deal with problems using appropriate initialization for the early stages as well as convergence speedups applied throughout the learning phases of reinforcement learning, and presents the first experimental results for these.

...read moreread less

Abstract: Dynamic and appropriate resource dimensioning isa crucial issue in cloud computing. As applications go more andmore 24/7, online policies must be sought to balance performancewith the cost of allocated virtual machines. Most industrialapproaches to date use ad hoc manual policies, such as thresholdbasedones. Providing good thresholds proved to be tricky andhard to automatize to fit every application requirement. Researchis being done to apply automatic decision-making approaches,such as reinforcement learning. Yet, they face a lot of problemsto go to the field: having good policies in the early phasesof learning, time for the learning to converge to an optimalpolicy and coping with changes in the application performancebehavior over time. In this paper, we propose to deal with theseproblems using appropriate initialization for the early stages aswell as convergence speedups applied throughout the learningphases and we present our first experimental results for these.We also introduce a performance model change detection onwhich we are currently working to complete the learning processmanagement. Even though some of these proposals were knownin the reinforcement learning field, the key contribution of thispaper is to integrate them in a real cloud controller and toprogram them as an automated workflow.

...read moreread less

Proceedings Article•DOI•

Experiences using cloud computing for a scientific workflow application

[...]

Jens-Sönke Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, Bruce Berriman¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

08 Jun 2011

TL;DR: This paper describes the experiences running a scientific workflow application developed to process astronomy data released by the Kepler project, a NASA mission to search for Earth-like planets orbiting other stars, and demonstrates how Pegasus was able to support sky computing by executing a single workflow across multiple cloud infrastructures simultaneously.

...read moreread less

Abstract: Clouds are rapidly becoming an important platform for scientific applications In this paper we describe our experiences running a scientific workflow application in the cloud The application was developed to process astronomy data released by the Kepler project, a NASA mission to search for Earth-like planets orbiting other stars This workflow was deployed across multiple clouds using the Pegasus Workflow Management System The clouds used include several sites within the FutureGrid, NERSC's Magellan cloud, and Amazon EC2 We describe how the application was deployed, evaluate its performance executing in different clouds (based on Nimbus, Eucalyptus, and EC2), and discuss the challenges of deploying and executing workflows in a cloud environment We also demonstrate how Pegasus was able to support sky computing by executing a single workflow across multiple cloud infrastructures simultaneously

...read moreread less

Journal Article•DOI•

Putting lipstick on pig: enabling database-style workflow provenance

[...]

Yael Amsterdamer¹, Susan B. Davidson², Daniel Deutch³, Tova Milo¹, Julia Stoyanovich², Val Tannen² - Show less +2 more•Institutions (3)

Tel Aviv University¹, University of Pennsylvania², Ben-Gurion University of the Negev³

01 Dec 2011

TL;DR: This work presents a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies.

...read moreread less

Abstract: Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow provenance. It also enables a number of novel graph transformation operations, allowing to choose the desired level of granularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We implemented our approach in the Lipstick system and developed a benchmark in support of a systematic performance evaluation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.

...read moreread less

Journal Article•DOI•

Computerization of workflows, guidelines, and care pathways: a review of implementation challenges for process-oriented health information systems

[...]

Phil Gooch¹, Abdul V. Roudsari², Abdul V. Roudsari¹•Institutions (2)

City University London¹, University of Victoria²

01 Nov 2011-Journal of the American Medical Informatics Association

TL;DR: It is found that the development of systems supporting individual clinical decisions is evolving toward the implementation of adaptable care pathways on the semantic web, incorporating formal, clinical, and organizational ontologies, and the use of workflow management systems.

...read moreread less

Proceedings Article•DOI•

Apache airavata: a framework for distributed applications and computational workflows

[...]

Suresh Marru¹, Lahiru Gunathilake¹, Chathura Herath¹, Patanachai Tangchaisin¹, Marlon Pierce¹, Chris A. Mattmann, Raminder Singh¹, Thilina Gunarathne¹, Eran Chinthaka¹, Ross Gardler, Aleksander Slominski², Ate Douma, Srinath Perera, Sanjiva Weerawarana - Show less +10 more•Institutions (2)

Indiana University¹, IBM²

18 Nov 2011

TL;DR: The architecture of Airavata and its modules are discussed, and how the software can be used as individual components or as an integrated solution to build science gateways or general-purpose distributed application and workflow management systems are illustrated.

...read moreread less

Abstract: In this paper, we introduce Apache Airavata, a software framework to compose, manage, execute, and monitor distributed applications and workflows on computational resources ranging from local resources to computational grids and clouds. Airavata builds on general concepts of service-oriented computing, distributed messaging, and workflow composition and orchestration. This paper discusses the architecture of Airavata and its modules, and illustrates how the software can be used as individual components or as an integrated solution to build science gateways or general-purpose distributed application and workflow management systems.

...read moreread less

Journal Article•DOI•

Optimizing the makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm

[...]

Xiaofeng Wang¹, Chee Shin Yeo², Rajkumar Buyya³, Jinshu Su¹•Institutions (3)

National University of Defense Technology¹, Institute of High Performance Computing Singapore², University of Melbourne³

01 Oct 2011-Future Generation Computer Systems

TL;DR: The experiments show that the RD reputation improves the reliability of an application with more accurate reputations, while the LAGA provides better solutions than existing list heuristics and evolves to better solutions more quickly than a traditional GA.

...read moreread less

Journal Article•DOI•

JMP statistical discovery software

[...]

Bradley Jones¹, John Sall¹•Institutions (1)

SAS Institute¹

01 May 2011-Wiley Interdisciplinary Reviews: Computational Statistics

TL;DR: JMP as mentioned in this paper is a statistical software environment that enables scientists, engineers, and business analysts to make discoveries through data exploration, and it supports custom design, an innovative approach to the statistical design of experiments, but whether your results come from designed experiments or from an observational study, they provide analytical tools that put graphs up front.

...read moreread less

Abstract: JMP is a statistical software environment that enables scientists, engineers, and business analysts to make discoveries through data exploration. One powerful method for beginning the process of discovery employs statistically designed experiments. A well-designed experiment ensures that the resulting data have large information content. We support this method with custom design, an innovative approach to the statistical design of experiments. But whether your results come from designed experiments or from an observational study, we provide analytical tools that put graphs up front. JMP's graphical user interface (GUI) makes these plots interactive and dynamically linked to each other and to the data. Moreover, in the design of JMP's user interface the priority was to make a smooth and natural workflow for data analysis. Notice how the data exploration flows in the following case study that investigates the relationship between life expectancy and health-care spending for 166 countries. WIREs Comp Stat 2011 3 188–194 DOI: 10.1002/wics.162 For further resources related to this article, please visit the WIREs website

...read moreread less

Journal Article•DOI•

Incorporating expert knowledge when learning Bayesian network structure: A medical case study

[...]

M. Julia Flores¹, Ann E. Nicholson², Andrew J. Brunskill³, Kevin B. Korb², Steven Mascaro - Show less +1 more•Institutions (3)

University of Castilla–La Mancha¹, Monash University, Clayton campus², University of Washington³

01 Nov 2011-Artificial Intelligence in Medicine

TL;DR: The detailed knowledge engineering workflow is shown to be useful for structuring a complex iterative BN development process and methods for incorporating it into the knowledge engineering process, including visualisation and analysis of the learned networks are presented.

...read moreread less

Journal Article•DOI•

An algebraic approach for data-centric scientific workflows

[...]

Eduardo Ogasawara¹, Jonas Dias², Daniel de Oliveira², Fabio Porto, Patrick Valduriez³, Marta Mattoso² - Show less +2 more•Institutions (3)

Centro Federal de Educação Tecnológica de Minas Gerais¹, Federal University of Rio de Janeiro², French Institute for Research in Computer Science and Automation³

01 Aug 2011

TL;DR: This work proposes an algebraic approach (inspired by relational algebra) and a parallel execution model that enable automatic optimization of scientific workflows and demonstrates performance improvements of up to 226% compared to an ad-hoc workflow implementation.

...read moreread less

Abstract: Scientific workflows have emerged as a basic abstraction for structuring and executing scientific experiments in computational environments. In many situations, these workflows are computationally and data intensive, thus requiring execution in large-scale parallel computers. However, parallelization of scientific workflows remains low-level, ad-hoc and labor-intensive, which makes it hard to exploit optimization opportunities. To address this problem, we propose an algebraic approach (inspired by relational algebra) and a parallel execution model that enable automatic optimization of scientific workflows. We conducted a thorough validation of our approach using both a real oil exploitation application and synthetic data scenarios. The experiments were run in Chiron, a data-centric scientific workflow engine implemented to support our algebraic approach. Our experiments demonstrate performance improvements of up to 226% compared to an ad-hoc workflow implementation.

...read moreread less

Patent•

Methods and systems for managing a virtual data center with embedded roles based access control

[...]

Denis Martin, David Grimes, Thomas Warnock, John Dwyer

01 Sep 2011

TL;DR: In this article, a role-based access control (RBAC) system is presented which simulates the organizational structure and workflow of a typical IT department to enable workflow management via the GUI for any component or function of a customer's virtual data center.

...read moreread less

Abstract: Embodiments provide techniques for customers to easily, quickly and remotely manage their virtual data centers. Using, for example, a "single pane of glass" GUI view which shows all of the components (including e.g., machines (cpu and RAM), network services (load balancers, firewalls, network address translation, IP management) and storage) of their virtual data centers, provides a complete overview and a starting point for system or component management. According to embodiments, a Roles Based Access Control (RBAC) system is provided which simulates the organizational structure and workflow of a typical IT department to enable workflow management via the GUI for any component or function of a customer's virtual data center.

...read moreread less

Journal Article•DOI•

The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation

[...]

Mark Wilkinson¹, Benjamin P. Vandervalk¹, Luke McCarthy¹•Institutions (1)

University of British Columbia¹

24 Oct 2011-Journal of Biomedical Semantics

TL;DR: The SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows, thus facilitating the intersection of Web services and Semantic Web technologies.

...read moreread less

Abstract: The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community. SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers. SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies.

...read moreread less

Journal Article•DOI•

Enhanced virtual microscopy for collaborative education

[...]

Marc M. Triola¹, William J Holloway¹•Institutions (1)

New York University¹

26 Jan 2011-BMC Medical Education

TL;DR: An open-source VM system based on the Google Maps engine that allows students and faculty to collaboratively create content, annotate slides with markers, and be enhanced with social networking features to give the community of learners more control over the system has been an effective solution to the challenges facing traditional histopathology laboratories and the novel needs of the revised curriculum.

...read moreread less

Abstract: Curricular reform efforts and a desire to use novel educational strategies that foster student collaboration are challenging the traditional microscope-based teaching of histology Computer-based histology teaching tools and Virtual Microscopes (VM), computer-based digital slide viewers, have been shown to be effective and efficient educational strategies We developed an open-source VM system based on the Google Maps engine to transform our histology education and introduce new teaching methods This VM allows students and faculty to collaboratively create content, annotate slides with markers, and it is enhanced with social networking features to give the community of learners more control over the system We currently have 1,037 slides in our VM system comprised of 39,386,941 individual JPEG files that take up 349 gigabytes of server storage space Of those slides 682 are for general teaching and available to our students and the public; the remaining 355 slides are used for practical exams and have restricted access The system has seen extensive use with 289,352 unique slide views to date Students viewed an average of 563 slides per month during the histology course and accessed the system at all hours of the day Of the 621 annotations added to 126 slides 262% were added by faculty and 738% by students The use of the VM system reduced the amount of time faculty spent administering the course by 210 hours, but did not reduce the number of laboratory sessions or the number of required faculty Laboratory sessions were reduced from three hours to two hours each due to the efficiencies in the workflow of the VM system Our virtual microscope system has been an effective solution to the challenges facing traditional histopathology laboratories and the novel needs of our revised curriculum The web-based system allowed us to empower learners to have greater control over their content, as well as the ability to work together in collaborative groups The VM system saved faculty time and there was no significant difference in student performance on an identical practical exam before and after its adoption We have made the source code of our VM freely available and encourage use of the publically available slides on our website

...read moreread less

Proceedings Article•DOI•

Turkomatic: automatic recursive task and workflow design for mechanical turk

[...]

Anand Kulkarni¹, Matthew Can¹, Björn Hartmann¹•Institutions (1)

University of California, Berkeley¹

07 May 2011

TL;DR: In this paper, the authors present a new method for automating task and workflow design for high-level, complex tasks, which is recursive, recruiting workers from the crowd to help plan out how problems can be solved most effectively.

...read moreread less

Abstract: Completing complex tasks on crowdsourcing platforms like Mechanical Turk currently requires significant up-front investment into task decomposition and workflow design. We present a new method for automating task and workflow design for high-level, complex tasks. Unlike previous approaches, our strategy is recursive, recruiting workers from the crowd to help plan out how problems can be solved most effectively. Our initial experiments suggest that this strategy can successfully create workflows to solve tasks considered difficult from an AI perspective, although it is highly sensitive to the design choices made by workers.

...read moreread less

Patent•

Hierarchical display and navigation of document revision histories

[...]

Tovi Grossman¹, Justin Matejka¹, George Fitzmaurice¹•Institutions (1)

Autodesk¹

19 Apr 2011

TL;DR: In this paper, a system and technique for displaying a document's workflow history are disclosed, which includes a graphical user interface for displaying one or more graphical representations of events generated by an application configured to edit a document.

...read moreread less

Abstract: A system and technique for displaying a document's workflow history are disclosed. The system includes a graphical user interface for displaying one or more graphical representations of events generated by an application configured to edit a document. Each of the events generated by the application may be stored in a data structure that is associated with one or more portions of the document. The data structure may also be associated with a digital image that reflects the state of the document at the time the event was generated and one or more frames of digital video captured substantially simultaneously with the generation of the event. The system may display the stored events via graphical representations in the graphical user interface that represent a portion of the total document workflow history. A user may navigate through the graphical events based on a hierarchical algorithm for clustering events.

...read moreread less

Journal Article•DOI•

P-GRADE portal family for grid infrastructures

[...]

Péter Kacsuk¹•Institutions (1)

Hungarian Academy of Sciences¹

01 Mar 2011-Concurrency and Computation: Practice and Experience

TL;DR: The paper summarizes the most advanced features of P‐GRADE, such as parameter sweep workflow execution, multi‐grid workflow execution and integration with the DSpace workflow repository, as well as introducing the second generation P‐ GRADE portal called WS‐PGRADE.

...read moreread less

Abstract: P-GRADE portal is one of the most widely used general-purpose grid portal in Europe. The paper summarizes the most advanced features of P-GRADE, such as parameter sweep workflow execution, multi-grid workflow execution and integration with the DSpace workflow repository. It also shows the NGS P-GRADE portal that extends P-GRADE with the GEMLCA legacy code execution support in Grid systems, as well as with coarse-grain workflow interoperability services. Next, the paper introduces the second generation P-GRADE portal called WS-PGRADE that merges the advanced features of the first generation P-GRADE portals and extends them with new workflow and architecture concepts. Finally, the application-specific science gateway of the CancerGrid project is briefly described to demonstrate that application-specific portals can easily be developed on top of the general-purpose WS-PGRADE portal. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Collapse