scispace - formally typeset
Search or ask a question
Author

Ben Clifford

Bio: Ben Clifford is an academic researcher from University of Chicago. The author has contributed to research in topics: Scripting language & Petascale computing. The author has an hindex of 15, co-authored 22 publications receiving 2475 citations. Previous affiliations of Ben Clifford include National Center for Supercomputing Applications & Argonne National Laboratory.

Papers
More filters
Journal ArticleDOI
TL;DR: This document contains the specification of the Open Provenance Model (v1.1) resulting from a community effort to achieve inter-operability in the Provenances Challenge series.

762 citations

Journal ArticleDOI
01 Sep 2011
TL;DR: This work presents Swift's implicitly parallel and deterministic programming model, which applies external applications to file collections using a functional style that abstracts and simplifies distributed parallel execution.
Abstract: Scientists, engineers, and statisticians must execute domain-specific application programs many times on large collections of file-based data. This activity requires complex orchestration and data management as data is passed to, from, and among application invocations. Distributed and parallel computing resources can accelerate such processing, but their use further increases programming complexity. The Swift parallel scripting language reduces these complexities by making file system structures accessible via language constructs and by allowing ordinary application programs to be composed into powerful parallel scripts that can efficiently utilize parallel and distributed resources. We present Swift's implicitly parallel and deterministic programming model, which applies external applications to file collections using a functional style that abstracts and simplifies distributed parallel execution.

421 citations

Proceedings ArticleDOI
09 Jul 2007
TL;DR: Swift adopts and adapts ideas first explored in the GriPhyN virtual data system, improving on that system in many regards and describes application experiences and performance experiments that quantify the cost of Swift operations.
Abstract: We present Swift, a system that combines a novel scripting language called SwiftScript with a powerful runtime system based on CoG Karajan, Falkon, and Globus to allow for the concise specification, and reliable and efficient execution, of large loosely coupled computations. Swift adopts and adapts ideas first explored in the GriPhyN virtual data system, improving on that system in many regards. We describe the SwiftScript language and its use of XDTM to describe the logical structure of complex file system structures. We also present the Swift runtime system and its use of CoG Karajan, Falkon, and Globus services to dispatch and manage the execution of many tasks in parallel and grid environments. We describe application experiences and performance experiments that quantify the cost of Swift operations.

387 citations

Proceedings ArticleDOI
04 Jun 2004
TL;DR: The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN, the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling.
Abstract: The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory ("Grid3") that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN (ATLAS and CMS), the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day. We describe the principles that have guided the development of this unique infrastructure and the practical experiences that have resulted from its creation and use. We discuss application requirements for grid services deployment and configuration, monitoring infrastructure, application performance, metrics, and operational experiences. We also summarize lessons learned.

138 citations

Journal IssueDOI
TL;DR: A functional magnetic resonance imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed.
Abstract: The first Provenance Challenge was set up in order to provide a forum for the community to understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a functional magnetic resonance imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarize the participants' contributions. Copyright © 2007 John Wiley & Sons, Ltd.

119 citations


Cited by
More filters
Proceedings ArticleDOI
01 Nov 2008
TL;DR: In this article, the authors compare and contrast cloud computing with grid computing from various angles and give insights into the essential characteristics of both the two technologies, and compare the advantages of grid computing and cloud computing.
Abstract: Cloud computing has become another buzzword after Web 2.0. However, there are dozens of different definitions for cloud computing and there seems to be no consensus on what a cloud is. On the other hand, cloud computing is not a completely new concept; it has intricate connection to the relatively new but thirteen-year established grid computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general. This paper strives to compare and contrast cloud computing with grid computing from various angles and give insights into the essential characteristics of both.

3,132 citations

Book ChapterDOI
30 Nov 2005
TL;DR: The principal characteristics of the latest release, the Web services-based GT4, which provides significant improvements over previous releases in terms of robustness, performance, usability, documentation, standards compliance, and functionality are summarized.
Abstract: The Globus Toolkit (GT) has been developed since the late 1990s to support the development of service-oriented distributed computing applications and infrastructures. Core GT components address, within a common framework, basic issues relating to security, resource access, resource management, data movement, resource discovery, and so forth. These components enable a broader “Globus ecosystem” of tools and components that build on, or interoperate with, core GT functionality to provide a wide range of useful application-level functions. These tools have in turn been used to develop a wide range of both “Grid” infrastructures and distributed applications. I summarize here the principal characteristics of the latest release, the Web services-based GT4, which provides significant improvements over previous releases in terms of robustness, performance, usability, documentation, standards compliance, and functionality.

1,509 citations

Proceedings Article
01 Jan 2003

1,212 citations

Journal ArticleDOI
TL;DR: The Brain Imaging Data Structure (BIDS) is developed, a standard for organizing and describing MRI datasets that uses file formats compatible with existing software, unifies the majority of practices already common in the field, and captures the metadata necessary for most common data processing operations.
Abstract: The development of magnetic resonance imaging (MRI) techniques has defined modern neuroimaging. Since its inception, tens of thousands of studies using techniques such as functional MRI and diffusion weighted imaging have allowed for the non-invasive study of the brain. Despite the fact that MRI is routinely used to obtain data for neuroscience research, there has been no widely adopted standard for organizing and describing the data collected in an imaging experiment. This renders sharing and reusing data (within or between labs) difficult if not impossible and unnecessarily complicates the application of automatic pipelines and quality assurance protocols. To solve this problem, we have developed the Brain Imaging Data Structure (BIDS), a standard for organizing and describing MRI datasets. The BIDS standard uses file formats compatible with existing software, unifies the majority of practices already common in the field, and captures the metadata necessary for most common data processing operations.

1,037 citations