scispace - formally typeset
Search or ask a question

Showing papers by "Michael Wilde published in 2009"


Journal ArticleDOI
TL;DR: Parallel scripting extends this technique to allow for the rapid development of highly parallel applications that can run efficiently on platforms ranging from multicore workstations to petascale supercomputers.
Abstract: Scripting accelerates and simplifies the composition of existing codes to form more powerful applications. Parallel scripting extends this technique to allow for the rapid development of highly parallel applications that can run efficiently on platforms ranging from multicore workstations to petascale supercomputers.

110 citations


Journal ArticleDOI
01 Jul 2009
TL;DR: The applications that can benefit from parallel scripting on petascale-class machines are characterized, the mechanisms that make this feasible on such systems are described, and results achieved with parallel scripts on currently available petascales computers are presented.
Abstract: Parallel scripting is a loosely-coupled programming model in which applications are composed of highly parallel scripts of program invocations that process and exchange data via files We characterize here the applications that can benefit from parallel scripting on petascale-class machines, describe the mechanisms that make this feasible on such systems, and present results achieved with parallel scripts on currently available petascale computers

28 citations


Proceedings ArticleDOI
14 Nov 2009
TL;DR: This work profiles the essential operations in the I/O workload for five loosely coupled scientific applications and offers an analysis to motivate and aid the development of programming tools,I/O subsystems, and filesystems.
Abstract: A large number of real-world scientific applications can be characterized as loosely coupled: the communication among tasks is infrequent and can be performed by using file operations. While these applications may be ported to large scale machines designed for tightly coupled, massively parallel jobs, direct implementations do not perform well because of the large number of small, latency-bound file accesses. This problem may be overcome through the use of a variety of custom, hand-coded strategies applied at various subsystems of modern near-petascale computers- but is a labor intensive process that will become increasingly difficult at the petascale and beyond. This work profiles the essential operations in the I/O workload for five loosely coupled scientific applications. We characterize the I/O workload induced by these applications and offer an analysis to motivate and aid the development of programming tools, I/O subsystems, and filesystems.

23 citations


Journal ArticleDOI
TL;DR: A computational framework suitable for a data-driven approach to structural equation modeling (SEM) is presented and several workflows for modeling functional magnetic resonance imaging (fMRI) data within this framework are described.
Abstract: We present a computational framework suitable for a data-driven approach to structural equation modeling (SEM) and describe several workflows for modeling functional magnetic resonance imaging (fMRI) data within this framework. The Computational Neuroscience Applications Research Infrastructure (CNARI) employs a high-level scripting language called Swift, which is capable of spawning hundreds of thousands of simultaneous R processes (R Core Development Team, 2008), consisting of self-contained structural equation models, on a high performance computing system (HPC). These self-contained R processing jobs are data objects generated by OpenMx, a plug-in for R, which can generate a single model object containing the matrices and algebraic information necessary to estimate parameters of the model. With such an infrastructure in place a structural modeler may begin to investigate exhaustive searches of the model space. Specific applications of the infrastructure, statistics related to model fit, and limitations are discussed in relation to exhaustive SEM. In particular, we discuss how workflow management techniques can help to solve large computational problems in neuroimaging.

18 citations


Journal ArticleDOI
TL;DR: The Computational Neuroscience Applications Research Infrastructure (CNARI) incorporates novel methods for maintaining, serving, and analyzing massive amounts of fMRI data and believes that these advanced computational approaches will fundamentally change the future shape of cognitive brain imaging with fMRI.

15 citations


Proceedings ArticleDOI
11 Dec 2009
TL;DR: An automation tool, ADEM, is proposed for grid application software deployment and management, and experimental results on the Open Science Grid show that ADEM is easy to use and more productive for users than manual operation.
Abstract: In grid environments, the deployment and management of application software presents a major practical challenge for end users. Performing these tasks manually is error-prone and not scalable to large grids. In this work, we propose an automation tool, ADEM, for grid application software deployment and management, and demonstrate and evaluate the tool on the Open Science Grid. ADEM uses Globus for basic grid services, and integrates the grid software installer Pacman. It supports both centralized “prebuild” and on-site “dynamic-build” approaches to software compilation, using the NMI Build and Test system to perform central prebuilds for specific target platforms. ADEM's parallel workflow automatically determines available grid sites and their platform “signatures”, checks for and integrates dependencies, and performs software build, installation, and testing. ADEM's tracking log of build and installation activities is helpful for troubleshooting potential exceptions. Experimental results on the Open Science Grid show that ADEM is easy to use and more productive for users than manual operation.

13 citations


Journal ArticleDOI
TL;DR: The Open Science Grid (OSG) as mentioned in this paper enables new science, new scientists, and new modalities in support of computationally based research, and leverages its deliverables to the large scale physics experiment member communities to benefit new communities at all scales through activities in education, engagement, and distributed facility.
Abstract: The Open Science Grid (OSG) includes work to enable new science, new scientists, and new modalities in support of computationally based research. There are frequently significant sociological and organizational changes required in transformation from the existing to the new. OSG leverages its deliverables to the large scale physics experiment member communities to benefit new communities at all scales through activities in education, engagement and the distributed facility. As a partner to the poster and tutorial at SciDAC 2008, this paper gives both a brief general description and some specific examples of new science enabled on the OSG. More information is available at the OSG web site: (http://www.opensciencegrid.org).

5 citations


Proceedings ArticleDOI
08 Dec 2009
TL;DR: A runtime reputation based grid resource selection algorithm that is dynamically adaptive to the runtime availability, load, and performance of the grid resources.
Abstract: The scheduling and execution for grid application is an important problem in the grid environment. To get the high reliability and efficiency, we propose a runtime reputation based grid resource selection algorithm. According to the accumulated raw score, the runtime reputation degree for a grid resource is quantified as an evaluating score in the runtime of an application. Instead of being dependent on the historical experiences, it is dynamically adaptive to the runtime availability, load, and performance of the grid resources. The execution framework on the grid is based on Globus Toolkit and Swift system. In a real production grid, Open Science Grid (OSG), a typical grid application with large scale independent jobs was experimented, which was based on BLAST application. The experimental results for the performance of different policies are presented, with a benchmarking workload size of 10,000 jobs. The runtime reputation and behavior statistics for the grid resources are also presented.

2 citations


Proceedings ArticleDOI
25 Jun 2009
TL;DR: This paper describes how to use swift to enable the on-demand execution of large scale PSA on open science grid (OSG), and the experimental results for the performance of different policies are presented.
Abstract: Large scale parameter sweep application (PSA) is one of the main grid applications, which may have different characteristics and demands. In this paper, we describe how to use swift to enable the on-demand execution of large scale PSA on open science grid (OSG). The basic on-demand concept means providing appropriate grid resources for the application, which is decided by the characteristics and demands of the application. So we can get high reliability, efficiency, and scalability for large scale independent PSA jobs on OSG. The main on-demand policies include: trust based site selection and pre-selection; scheduling policy on-demand configuration; clustering for small jobs; adaptive execution and automatic data staging; divide and conquer for the scalability. Some usage examples of swift for executing large scale PSA are presented, such as dock, blast. The experimental results for the performance of different policies are presented, with a benchmarking workload size of 10,000 jobs.

1 citations