scispace - formally typeset
Search or ask a question

Showing papers by "Michael Wilde published in 2012"


Proceedings ArticleDOI
13 May 2012
TL;DR: The evaluation using synthetic benchmarks shows that a workflow-aware storage system can bring significant performance gains: up to 7× performance gain compared to the distributed storage system - MosaStore and up to 16× compared to a central, well provisioned, NFS server.
Abstract: This paper evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing storage systems unable to harness all optimization opportunities as this often requires conflicting optimization options or even conflicting design decision at the level of the storage system. Second, when scheduling, workflow runtime engines make suboptimal decisions as they lack detailed data location information. This paper discusses the feasibility, and evaluates the potential performance benefits brought by, building a workflow-aware storage system that supports per-file access optimizations and exposes data location. To this end, this paper presents approaches to determine the application-specific data access patterns, and evaluates experimentally the performance gains of a workflow-aware storage approach. Our evaluation using synthetic benchmarks shows that a workflow-aware storage system can bring significant performance gains: up to 7x performance gain compared to the distributed storage system - MosaStore and up to 16x compared to a central, well provisioned, NFS server.

34 citations


Proceedings ArticleDOI
20 May 2012
TL;DR: The architecture of Turbine is presented, a new highly scalable and distributed many-task dataflow engine that executes a generalized many- task intermediate representation with automated self-distribution, and is scalable to multi-petaflop infrastructures.
Abstract: Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely-coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. "Many-task" programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tighly-coupled parallelism at the lower level via multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and inter-task data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution, and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.

28 citations


Journal ArticleDOI
TL;DR: This work presents a free modeling method for predicting the local structure of loops and large InsEnds in both crystal structures and template‐based models, and ranks as one of the best in the CASP9 refinement category that involves improving template‐ based models so that they can function as molecular replacement models to solve the phase problem for crystallographic structure determination.
Abstract: Template-based methods for predicting protein structure provide models for a significant portion of the protein but often contain insertions or chain ends (InsEnds) of indeterminate conformation. The local structure prediction ''problem'' entails modeling the InsEnds onto the rest of the protein. A well-known limit involves predicting loops of � 12 residues in crystal structures. However, InsEnds may contain as many as ~50 amino acids, and the template-based model of the protein itself may be imperfect. To address these challenges, we present a free modeling method for predicting the local structure of loops and large InsEnds in both crystal structures and template- based models. The approach uses single amino acid torsional angle ''pivot'' moves of the protein backbone with a Cb level representation. Nevertheless, our accuracy for loops is comparable to existing methods. We also apply a more stringent test, the blind structure prediction and refinement categories of the CASP9 tournament, where we improve the quality of several homology based models by modeling InsEnds as long as 45 amino acids, sizes generally inaccessible to existing loop prediction methods. Our approach ranks as one of the best in the CASP9 refinement category that involves improving template-based models so that they can function as molecular replacement models to solve the phase problem for crystallographic structure determination.

26 citations


Journal ArticleDOI
TL;DR: MTCProv is contributed with MTCProv, a provenance query framework for many-task scientific computing that captures the runtime execution details of MTC workflow tasks on parallel and distributed systems, in addition to standard prospective and data derivation provenance.
Abstract: Scientific research is increasingly assisted by computer-based experiments. Such experiments are often composed of a vast number of loosely-coupled computational tasks that are specified and automated as scientific workflows. This large scale is also characteristic of the data that flows within such "many-task" computations (MTC). Provenance information can record the behavior of such computational experiments via the lineage of process and data artifacts. However, work to date has focused on lineage data models, leaving unsolved issues of recording and querying other aspects, such as domain-specific information about the experiments, MTC behavior given by resource consumption and failure information, or the impact of environment on performance and accuracy. In this work we contribute with MTCProv, a provenance query framework for many-task scientific computing that captures the runtime execution details of MTC workflow tasks on parallel and distributed systems, in addition to standard prospective and data derivation provenance. To help users query provenance data we provide a high level interface that hides relational query complexities. We evaluate MTCProv using an application in protein science, and describe how important query patterns such as correlations between provenance, runtime data, and scientific parameters are simplified and expressed.

22 citations


Posted Content
TL;DR: This report discusses many-task computing generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012.
Abstract: This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware.

21 citations


Proceedings ArticleDOI
19 Jun 2012
TL;DR: The challenges of reducing the time-to-solution of the data intensive earthquake simulation workflow "CyberShake" is addressed by supplementing the high-performance parallel computing (HPC) resources on which it typically runs with distributed, heterogeneous resources that can be obtained opportunistically from grids and clouds.
Abstract: In this paper, we address the challenges of reducing the time-to-solution of the data intensive earthquake simulation workflow "CyberShake" by supplementing the high-performance parallel computing (HPC) resources on which it typically runs with distributed, heterogeneous resources that can be obtained opportunistically from grids and clouds. We seek to minimize time to solution by maximizing the amount of work that can be efficiently done on the distributed resources. We identify data movement as the main bottleneck in effectively utilizing the combined local and distributed resources. We address this by analyzing the I/O characteristics of the application, processor acquisition rate (from a pilot-job service), and the data movement throughput of the infrastructure. With these factors in mind, we explore a combination of strategies including partitioning of computation (over HPC and distributed resources) and job clustering.We validate our approach with a theoretical study and with preliminary measurements on the Ranger HPC system and distributed Open Science Grid resources. More complete performance results will be presented in the final submission of this paper.

6 citations


Proceedings ArticleDOI
10 Nov 2012
TL;DR: A new data-parallel library is created, the Parallel Gridded Analysis Library (ParGAL), which can read in data using parallel I/O, store the data on a compete representation of the structured or unstructured mesh and perform sophisticated analysis on the data in parallel.
Abstract: Climate models are both outputting larger and larger amounts of data and are doing it on more sophisticated numerical grids. The tools climate scientists have used to analyze climate output, an essential component of climate modeling, are single threaded and assume rectangular structured grids in their analysis algorithms. We are bringing both task- and data-parallelism to the analysis of climate model output. We have created a new data-parallel library, the Parallel Gridded Analysis Library (ParGAL) which can read in data using parallel I/O, store the data on a compete representation of the structured or unstructured mesh and perform sophisticated analysis on the data in parallel. ParGAL has been used to create a parallel version of a script-based analysis and visualization package. Finally, we have also taken current workflows and employed task-based parallelism to decrease the total execution time.

1 citations