scispace - formally typeset
Search or ask a question

Showing papers in "Concurrency and Computation: Practice and Experience in 2005"


Journal ArticleDOI
TL;DR: The history and philosophy of the Condor project is provided and how it has interacted with other projects and evolved along with the field of distributed computing is described.
Abstract: SUMMARY Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational Grid. In this paper, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reflect on the lessons of experience and chart the course travelled by research ideas as they grow into production systems. Copyright c � 2005 John Wiley & Sons, Ltd.

1,969 citations


Journal ArticleDOI
TL;DR: The design and implementation of OGSA‐DAI, a service‐based architecture for database access over the Grid that allows consumers to discover the properties of structured data stores and to access their contents is described.
Abstract: Initially, Grid technologies were principally associated with supercomputer centres and large-scale scientific applications in physics and astronomy. They are now increasingly seen as being relevan...

345 citations


Journal ArticleDOI
TL;DR: The overall architecture of the ASKALON tool set is described and the basic functionality of the four constituent tools are outlined, enabling tool interoperability and demonstrating the usefulness and effectiveness of ASKalON by applying the tools to real‐world applications.
Abstract: Performance engineering of parallel and distributed applications is a complex task that iterates through various phases, ranging from modeling and prediction, to performance measurement, experiment ...

202 citations


Journal ArticleDOI
TL;DR: Ibis is a new programming environment that combines Java's ‘run everywhere’ portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object‐based communication.
Abstract: In computational Grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing Grid programming environments stems exactly from the dynamic availability of compute cycles: Grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high-performance computing in the first place. Existing programming environments are either portable (Java), or flexible (Jini, Java Remote Method Invocation or (RMI)), or they are highly efficient (Message Passing Interface). No system combines all three properties that are necessary for Grid computing. In this paper, we present Ibis, a new programming environment that combines Java's ‘run everywhere’ portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object-based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero-copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to nine times higher throughputs with trees of objects. Copyright © 2005 John Wiley & Sons, Ltd.

167 citations


Journal ArticleDOI
TL;DR: The design and implementation of a software system that dynamically adjusts the parallelism of applications executing on computational Grids in accordance with the changing load characteristics of the underlying resources is discussed.
Abstract: Optimizing a given software system to exploit the features of the underlying system has been an area of research for many years. Recently, a number of self-adapting software systems have been designed and developed for various computing environments. In this paper, we discuss the design and implementation of a software system that dynamically adjusts the parallelism of applications executing on computational Grids in accordance with the changing load characteristics of the underlying resources. The migration framework implemented by our software system is aimed at performance-oriented Grid systems and implements tightly coupled policies for both suspension and migration of executing applications. The suspension and migration policies consider both the load changes on systems as well as the remaining execution times of the applications thereby taking into account both system load and application characteristics. The main goal of our migration framework is to improve the response times for individual applications. We also present some results that demonstrate the usefulness of our migration framework.

141 citations


Journal IssueDOI
TL;DR: The history and philosophy of the Condor project is provided and how it has interacted with other projects and evolved along with the field of distributed computing is described.
Abstract: Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational Grid. In this paper, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reflect on the lessons of experience and chart the course travelled by research ideas as they grow into production systems. Copyright © 2005 John Wiley & Sons, Ltd.

141 citations


Journal ArticleDOI
TL;DR: An evolutionary roadmap that will allow us to capture generic middleware components from projects in a form that will facilitate migration or interoperability with the emerging Grid Web Services standards and with ongoing OGSA developments is set out.
Abstract: The UK e-Science Programme is a £250M, 5 year initiative which has funded over 100 projects. These application-led projects are under-pinned by an emerging set of core middleware services that allow the coordinated, collaborative use of distributed resources. This set of middleware services runs on top of the research network and beneath the applications we call the ‘Grid’. Grid middleware is currently in transition from pre-Web Service versions to a new version based on Web Services. Unfortunately, only a very basic set of Web Services embodied in the Web Services Interoperability proposal, WS-I, are agreed by most IT companies. IBM and others have submitted proposals for Web Services for Grids - the Web Services ResourceFramework and Web Services Notification specifications - to the OASIS organisation for standardisation. This process could take up to 12 months from March 2004 and the specifications are subject to debate and potentially significant changes. Since several significant UK e-Science projects come to an end before the end of this process, the UK therefore needs to develop a strategy that will protect the UK’s investment in Grid middleware by informing the Open Middleware Infrastructure Institute’s (OMII) roadmap and UK middleware repository in Southampton. This paper sets out an evolutionary roadmap that will allow us to capture generic middleware components from projects in a form that will facilitate migration or interoperability with the emerging Grid Web Services standards and with on-going OGSA developments. In this paper we therefore define a set of Web Services specifications - that we call ‘WS-I+’ to reflect the fact that this is a larger set than currently accepted by WS-I – that we believe will enable us to achieve the twin goals of capturing these components and facilitating migration to future standards. We believe that the extra Web Services specifications we have included in WS-I+ are both helpful in building e-Science Grids and likely to be widely accepted.

112 citations


Journal IssueDOI
TL;DR: The overall architecture of the ASKALON tool set is described and the basic functionality of the four constituent tools are outlined, including the PerformanceProphet, which enables the user to model and predict the performance of parallel applications at the early stages of development.
Abstract: Performance engineering of parallel and distributed applications is a complex task that iterates through various phases, ranging from modeling and prediction, to performance measurement, experiment management, data collection, and bottleneck analysis. There is no evidence so far that all of these phases shouldscan be integrated into a single monolithic tool. Moreover, the emergence of computational Grids as a common single wide-area platform for high-performance computing raises the idea to provide tools as interacting Grid services that share resources, support interoperability among different users and tools, and, most importantly, provide omnipresent services over the Grid. We have developed the ASKALON tool set to support performance-oriented development of parallel and distributed (Grid) applications. ASKALON comprises four tools, coherently integrated into a service-oriented architecture. SCALEA is a performance instrumentation, measurement, and analysis tool of parallel and distributed applications. ZENTURIO is a general purpose experiment management tool with advanced support for multi-experiment performance analysis and parameter studies. AKSUM provides semi-automatic high-level performance bottleneck detection through a special-purpose performance property specification language. The PerformanceProphet enables the user to model and predict the performance of parallel applications at the early stages of development. In this paper we describe the overall architecture of the ASKALON tool set and outline the basic functionality of the four constituent tools. The structure of each tool is based on the composition and sharing of remote Grid services, thus enabling tool interoperability. In addition, a data repository allows the tools to share the common application performance and output data that have been derived by the individual tools. A service repository is used to store common portable Grid service implementations. A general-purpose Factory service is employed to create service instances on arbitrary remote Grid sites. Discovering and dynamically binding to existing remote services is achieved through registry services. The ASKALON visualization diagrams support both online and post-mortem visualization of performance and output data. We demonstrate the usefulness and effectiveness of ASKALON by applying the tools to real-world applications. Copyright © 2005 John Wiley & Sons, Ltd.

98 citations


Journal ArticleDOI
TL;DR: Triana is described, a distributed problem‐solving environment that makes use of the Grid to enable a user to compose applications from a set of components, select resources on which the composed application can be distributed and then execute the application on those resources.
Abstract: In this paper, we describe Triana, a distributed problem-solving environment that makes use of the Grid to enable a user to compose applications from a set of components, select resources on which the composed application can be distributed and then execute the application on those resources. We describe Triana's current pluggable architecture that can support many different modes of operation by the use of flexible writers for many popular Web service choreography languages. We further show, that the Triana architecture is middleware-independent through the use of the Grid Application Toolkit (GAT) API and demonstrate this through the use of a GAT binding to JXTA. We describe how other bindings being developed to Grid infrastructures, such as OGSA, can seamlessly be integrated within the current prototype by using the switching capability of the GAT. Finally, we outline an experiment we conducted using this prototype and discuss its current status. Copyright © 2005 John Wiley & Sons, Ltd

94 citations


Journal ArticleDOI
TL;DR: The design of the Parallel Ocean Program (POP) is described with an emphasis on portability, and analysis of POP performance across machines is used to characterize performance and identify improvements while maintaining portability.
Abstract: The design of the Parallel Ocean Program (POP) is described with an emphasis on portability. Performance of POP is presented on a wide variety of computational architectures, including vector archi...

82 citations


Journal ArticleDOI
TL;DR: The MEAD (Middleware for Embedded Adaptive Dependability) system attempts to identify and to reconcile the conflicts between real‐time and fault tolerance, in a resource‐aware manner, for distributed CORBA applications.
Abstract: SUMMARY The OMG’s Real-Time CORBA (RT-CORBA) and Fault-Tolerant CORBA (FT-CORBA) specifications make it possible for today’s CORBA implementations to exhibit either real-time or fault tolerance in isolation. While real-time requires ap rioriknowledge of the system’s temporal operation, fault tolerance necessarily deals with faults that occur unexpectedly, and with possibly unpredictable fault recovery times. The MEAD (Middleware for Embedded Adaptive Dependability) system attempts to identify and to reconcile the conflicts between real-time and fault tolerance, in a resource-aware manner, for distributed CORBA applications. MEAD supports transparent yet tunable fault tolerance in real-time, proactive dependability, resource-aware system adaptation to crash, communication and timing faults with bounded fault detection and fault recovery. Copyright c � 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A Java profile for the development of software‐intensive high‐integrity real‐time systems is presented, which removes language features with high overheads and complex semantics, on which it is hard to perform timing and functional analyses.
Abstract: For many, Java is the antithesis of a high-integrity programming language. Its combination of object-oriented programming features, its automatic garbage collection, and its poor support for real-time multi-threading are all seen as particular impediments. The Real-Time Specification for Java has introduced many new features that help in the real-time domain. However, the expressive power of these features means that very complex programming models can be created, necessitating complexity in the supporting real-time virtual machine. Consequently, Java, with the real-time extensions as they stand, seems too complex for confident use in high-integrity systems. This paper presents a Java profile for the development of software-intensive high-integrity real-time systems. This restricted programming model removes language features with high overheads and complex semantics, on which it is hard to perform timing and functional analyses. The profile fits within the J2ME framework and is consistent with well-known guidelines for high-integrity software development, such as those defined by the U.S. Nuclear Regulatory Commission. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A Grid‐enabled, high‐throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware is presented, which can be used as a template for the development of similar applications.
Abstract: Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid-enabled, high-throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid-enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ‘mini-Grid’ testbed and the results presented here show that for large problem sizes, a distributed, Grid-enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd.

Journal IssueDOI
TL;DR: Ibis is a new programming environment that combines Java's ‘run everywhere’ portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object-based communication.
Abstract: In computational Grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing Grid programming environments stems exactly from the dynamic availability of compute cycles: Grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high-performance computing in the first place. Existing programming environments are either portable (Java), or flexible (Jini, Java Remote Method Invocation or (RMI)), or they are highly efficient (Message Passing Interface). No system combines all three properties that are necessary for Grid computing. In this paper, we present Ibis, a new programming environment that combines Java's ‘run everywhere’ portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object-based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero-copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to nine times higher throughputs with trees of objects. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: Simple heuristics which yield accurate trees for synthetic as well as real data and significantly reduce execution time are presented and RAxML‐II outperforms MrBayes for real‐world data both in terms of speed and final likelihood values.
Abstract: Inference of phylogenetic trees comprising hundreds or even thousands of organisms based on the maximum likelihood method is computationally intensive. We present simple heuristics which yield accurate trees for synthetic as well as real data and significantly reduce execution time. Those heuristics have been implemented in a sequential, parallel, and distributed program called RAxML-II, which is freely available as open source code. We compare the performance of the sequential program with PHYML and MrBayes which—to the best of our knowledge—are currently the fastest and most accurate programs for phylogenetic tree inference based on statistical methods. Experiments are conducted using 50 synthetic 100 taxon alignments as well as nine real-world alignments comprising 101 up to 1000 sequences. RAxML-II outperforms MrBayes for real-world data both in terms of speed and final likelihood values. Furthermore, for real data RAxML-II requires less time (a factor of 2–8) than PHYML to reach PHYML's final likelihood values and yields better final trees due to its more exhaustive search strategy. For synthetic data MrBayes is slightly more accurate than RAxML-II and PHYML but significantly slower. The non-deterministic parallel program shows good speedup values and has been used to infer a 10 000-taxon tree comprising organisms from the domains Eukarya, Bacteria, and Archaea. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
Michal Cierniak1, Marsha Eng2, Neal Glew2, Brian T. Lewis2, James M. Stichnoth2 
TL;DR: The structure of ORP is described in detail, paying particular attention to how it supports flexibility while preserving high performance; the interfaces between the garbage collector, the JIT, and the core VM; how these interfaces enable multiple garbage collectors and JITs without sacrificing performance; and how they allow theJIT and thecore VM to reduce or eliminate MRTE‐specific performance issues.
Abstract: The Open Runtime Platform (ORP) is a high-performance managed runtime environment (MRTE) that features exact generational garbage collection, fast thread synchronization, and multiple coexisting just-in-time compilers (JITs). ORP was designed for flexibility in order to support experiments in dynamic compilation, garbage collection, synchronization, and other technologies. It can be built to run either Java or Common Language Infrastructure (CLI) applications, to run under the Windows or Linux operating systems, and to run on the IA-32 or Itanium processor family (IPF) architectures. Achieving high performance in a MRTE presents many challenges, particularly when flexibility is a major goal. First, to enable the use of different garbage collectors and JITs, each component must be isolated from the rest of the environment through a well-defined software interface. Without careful attention, this isolation could easily harm performance. Second, MRTEs have correctness and safety requirements that traditional languages such as C++ lack. These requirements, including null pointer checks, array bounds checks, and type checks, impose additional runtime overhead. Finally, the dynamic nature of MRTEs makes some traditional compiler optimizations, such as devirtualization of method calls, more difficult to implement or more limited in applicability. To get full performance, JITs and the core virtual machine (VM) must cooperate to reduce or eliminate (where possible) these MRTE-specific overheads. In this paper, we describe the structure of ORP in detail, paying particular attention to how it supports flexibility while preserving high performance. We describe the interfaces between the garbage collector, the JIT, and the core VM; how these interfaces enable multiple garbage collectors and JITs without sacrificing performance; and how they allow the JIT and the core VM to reduce or eliminate MRTE-specific performance issues. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper presents a formal framework of a distributed computation based on a publish/subscribe system that allows one to model concurrent execution of publication and subscription operations without waiting for the stability of the system state and to define a Liveness property which gives the conditions for the presence of a notification event in the global history of thesystem.
Abstract: SUMMARY This paper presents a formal framework of a distributed computation based on a publish/subscribesystem. The framework abstracts the system through two delays, namely the subscription/unsubscription delay and the diffusion delay. This abstraction allows one to model concurrent execution of publication and subscriptionoperations withoutwaitingfor the stabilityofthesystem state and to definea Liveness property which gives the conditions for the presence of a notification event in the global history of the system. This formal framework allows us to analytically define a measure of the effectiveness of a publish/subscribe system, which reflects the percentage of notifications guaranteed by the system to subscribers. A simulation study confirms the validity of the analytical measurements. Copyright c � 2005 John Wiley & Sons, Ltd.

Journal IssueDOI
TL;DR: The design and implementation of OGSA-DAI, a service-based architecture for database access over the Grid designed to be extensible to accommodate different storage paradigms, is described and motivates.
Abstract: Initially, Grid technologies were principally associated with supercomputer centres and large-scale scientific applications in physics and astronomy. They are now increasingly seen as being relevant to many areas of e-Science and e-Business. The emergence of the Open Grid Services Architecture (OGSA), to complement the ongoing activity on Web Services standards, promises to provide a service-based platform that can meet the needs of both business and scientific applications. Early Grid applications focused principally on the storage, replication and movement of file-based data. Now the need for the full integration of database technologies with Grid middleware is widely recognized. Not only do many Grid applications already use databases for managing metadata, but increasingly many are associated with large databases of domain-specific information (e.g. biological or astronomical data). This paper describes the design and implementation of OGSA-DAI, a service-based architecture for database access over the Grid. The approach involves the design of Grid Data Services that allow consumers to discover the properties of structured data stores and to access their contents. The initial focus has been on support for access to Relational and XML data, but the overall architecture has been designed to be extensible to accommodate different storage paradigms. The paper describes and motivates the design decisions that have been taken, and illustrates how the approach supports a range of application scenarios. The OGSA-DAI software is freely available from . Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The UMM specification framework is presented, which integrates two key features to support memory model verification: it employs a simple and generic memory abstraction that can capture a large collection of memory models as guarded commands with a uniform notation and provides built‐in model checking capability to enable formal reasoning about thread behaviors.
Abstract: Given the complicated nature of modern shared memory systems, it is vital to have a systematic approach to specifying and analyzing memory consistency requirements. In this paper, we present the UMM specification framework, which integrates two key features to support memory model verification: (i) it employs a simple and generic memory abstraction that can capture a large collection of memory models as guarded commands with a uniform notation, and (ii) it provides built-in model checking capability to enable formal reasoning about thread behaviors. Using this framework, memory models can be specified in a parameterized style—designers can simply redefine a few bypassing rules and visibility ordering rules to obtain an executable specification of another memory model. We formalize several classical memory models, including Sequential Consistency, Coherence, and PRAM, to illustrate the general techniques of applying this framework. We then provide an alternative specification of the Java memory model, based on a proposal from Manson and Pugh, and demonstrate how to analyze Java thread semantics using model checking. We also compare our operational specification style with axiomatic specification styles and explore a mechanism that converts a memory model definition from one style to the other. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors describe the design, development and operation of a prototype of such an application that uses peer-to-peer interactions between distributed services and data on the Grid to enable the autonomic optimization of an oil reservoir.
Abstract: The emerging Grid infrastructure and its support for seamless and secure interactions is enabling a new generation of autonomic applications where the application components, Grid services, resources, and data interact as peers to manage, adapt and optimize themselves and the overall application In this paper we describe the design, development and operation of a prototype of such an application that uses peer-to-peer interactions between distributed services and data on the Grid to enable the autonomic optimization of an oil reservoir Copyright © 2005 John Wiley & Sons, Ltd

Journal ArticleDOI
TL;DR: The design and development of MEG data analysis system by leveraging Grid technologies, primarily Nimrod‐G, Gridbus, and Globus are presented and the composition of the neuroscience (brain‐activity analysis) application as parameter‐sweep application and its on‐demand deployment on global Grids for distributed execution is described.
Abstract: The distribution of knowledge (by scientists) and data sources (advanced scientific instruments), and the need for large-scale computational resources for analyzing massive scientific data are two major problems commonly observed in scientific disciplines. Two popular scientific disciplines of this nature are brain science and high-energy physics. The analysis of brain-activity data gathered from the MEG (magnetoencephalography) instrument is an important research topic in medical science since it helps doctors in identifying symptoms of diseases. The data needs to be analyzed exhaustively to efficiently diagnose and analyze brain functions and requires access to large-scale computational resources. The potential platform for solving such resource intensive applications is the Grid. This paper presents the design and development of MEG data analysis system by leveraging Grid technologies, primarily Nimrod-G, Gridbus, and Globus. It describes the composition of the neuroscience (brain-activity analysis) application as parameter-sweep application and its on-demand deployment on global Grids for distributed execution. The results of economic-based scheduling of analysis jobs for three different optimizations scenarios on the world-wide Grid testhed resources are presented along with their graphical visualization.

Journal ArticleDOI
TL;DR: This paper presents a docking algorithm based on molecular dynamics which has a highly flexible computational granularity and is applicable even to loosely coupled distributed systems such as desktop Grids for docking.
Abstract: Few methods use molecular dynamics simulations in concert with atomically detailed force fields to perform protein–ligand docking calculations because they are considered too time demanding, despite their accuracy. In this paper we present a docking algorithm based on molecular dynamics which has a highly flexible computational granularity. We compare the accuracy and the time required with well-known, commonly used docking methods such as AutoDock, DOCK, FlexX, ICM, and GOLD. We show that our algorithm is accurate, fast and, because of its flexibility, applicable even to loosely coupled distributed systems such as desktop Grids for docking. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A fault‐tolerant task scheduler integrates work stealing with an advanced form of eager scheduling that enables dynamic task decomposition, which improves host load‐balancing in the presence of tasks whose non‐uniform computational load is evident only at execution time.
Abstract: Javelin 3 is a software system for developing large-scale, fault-tolerant, adaptively parallel applications. When all or part of their application can be cast as a master–worker or branch-and-bound computation, Javelin 3 frees application developers from concerns about inter-processor communication and fault tolerance among networked hosts, allowing them to focus on the underlying application. The paper describes a fault-tolerant task scheduler and its performance analysis. The task scheduler integrates work stealing with an advanced form of eager scheduling. It enables dynamic task decomposition, which improves host load-balancing in the presence of tasks whose non-uniform computational load is evident only at execution time. Speedup measurements are presented of actual performance on up to 1000 hosts. We analyze the expected performance degradation due to unresponsive hosts, and measure actual performance degradation due to unresponsive hosts. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The design of the Parallel Ocean Program (POP) is described with an emphasis on portability and analysis of POP performance across machines is used to characterize performance and identify improvements while maintaining portability.
Abstract: The design of the Parallel Ocean Program (POP) is described with an emphasis on portability. Performance of POP is presented on a wide variety of computational architectures, including vector architectures and commodity clusters. Analysis of POP performance across machines is used to characterize performance and identify improvements while maintaining portability. A new design of the POP model, including a cache blocking and land point elimination scheme, is described with some preliminary performance results. Published in 2005 by John Wiley & Sons, Ltd.This article is a U.S. Government work and is in the public domain in the U.S.A.

Journal ArticleDOI
TL;DR: This paper presents performance engineering techniques that aim to facilitate an efficient use of Grid systems, in particular systems that deal with the management of large‐scale data sets in the tera‐ and petabyte range (also referred to as data Grids).
Abstract: The vision of Grid computing is to facilitate worldwide resource sharing among distributed collaborations. With the help of numerous national and international Grid projects, this vision is becoming reality and Grid systems are attracting an ever increasing user base. However, Grids are still quite complex software systems whose efficient use is a difficult and error-prone task. In this paper we present performance engineering techniques that aim to facilitate an efficient use of Grid systems, in particular systems that deal with the management of large-scale data sets in the tera- and petabyte range (also referred to as data Grids). These techniques are applicable at different layers of a Grid architecture and we discuss the tools required at each of these layers to implement them. Having discussed important performance engineering techniques, we investigate how major Grid projects deal with performance issues particularly related to data Grids and how they implement the techniques presented. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This survey presents and compares the different approaches, while particularly focusing on the less well‐explored loosely coordinated time sharing, in terms of modification of standard operating systems, the runtime system and the communication libraries.
Abstract: Loosely coordinated (implicit/dynamic) coscheduling is a time-sharing approach that originates from network of workstations environments of mixed parallel/serial workloads and limitedsoftware support. It is meant to be an easy-to-implement and scalable approach. Considering that the percentage of clusters in parallel computing is increasing and easily portable software is needed, loosely coordinated coscheduling becomes an attractive approach for dedicated machines. Loose coordination offers attractive features as a dynamic approach. Static approaches for local job scheduling assign resources exclusively and non-preemptively. Such approaches still remain beyond the desirable resource utilization and average response times. Conversely, approaches for dynamic scheduling of jobs can preempt resources and/or adapt their allocation. They typically provide better resource utilization and response times. Existing dynamic approaches are full preemption with checkpointing, dynamic adaptation of node/CPU allocation, and time sharing via gang or loosely coordinated coscheduling. This survey presents and compares the different approaches, while particularly focusing on the less well-explored loosely coordinated time sharing. The discussion particularly focuses on the implementation problems, in terms of modification of standard operating systems, the runtime system and the communication libraries. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The study shows that some Java programs can benefit significantly from object inlining, with close to a 10% speedup, and one case, the db benchmark, where the most important inlinable field was the result of unusual program design, and fixing this small flaw led to both better performance and clearer program design.
Abstract: Object-oriented languages, such as Java, encourage the use of many small objects linked together by field references, instead of a few monolithic structures. While this practice is beneficial from a program design perspective, it can slow down program execution by incurring many pointer indirections. One solution to this problem is object inlining: when the compiler can safely do so, it fuses small objects together, thus removing the reads/writes to the removed field, saving the memory needed to store the field and object header, and reducing the number of object allocations. The objective of this paper is to measure the potential for object inlining by studying the run-time behaviour of a comprehensive set of Java programs. We study the traces of program executions in order to determine which fields behave like inlinable fields. Since we are using dynamic information instead of a static analysis, our results give an upper bound on what could be achieved via a static compiler-based approach. Our experimental results measure the potential improvements attainable with object inlining, including reductions in the numbers of field reads and writes, and reduced memory usage. Our study shows that some Java programs can benefit significantly from object inlining, with close to a 10% speedup. Somewhat to our surprise, our study found one case, the db benchmark, where the most important inlinable field was the result of unusual program design, and fixing this small flaw led to both better performance and clearer program design. However, the opportunities for object inlining are highly dependent on the individual program being considered, and are in many cases very limited. Furthermore, fields that are inlinable also have properties that make them potential candidates for other optimizations such as removing redundant memory accesses. The memory savings possible through object inlining are moderate. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The building blocks for a parallel‐adaptive scheme for the solution of time‐dependent and nonlinear partial differential equations and novel techniques to avoid hanging nodes are introduced, which assure conforming meshes of hybrid element type in three space dimensions.
Abstract: Advanced parallel applications based on the message-passing paradigm are difficult to design and implement, especially when solution adaptive techniques are used and three-dimensional problems on complex geometries are faced, which yield the use of unstructured Grids. We present the building blocks for a parallel-adaptive scheme for the solution of time-dependent and nonlinear partial differential equations. To minimize computational requirements, h-adaptivity is introduced via parallel, local Grid adaptation. Novel techniques to avoid hanging nodes are introduced, these assure conforming meshes of hybrid element type in three space dimensions. As a core of the adaptive scheme, local multigrid methods are used to solve the arising linear systems rapidly in parallel. Dynamic Grid changes from h-adaptivity lead to load imbalance during run time, therefore dynamic load balancing and migration is performed to exploit the aggregated performance of large processor sets efficiently. Real-world calculations arising from density-driven flow problems in porous media are performed using the presented parallel-adaptive solution strategy. The computations are analyzed with regard to speedup. Timings of Grid adaptation, dynamic load balancing/migration and numerical solution scheme show that large-scale runs on 512 processors gain an overall parallel, numerical speedup of up to 278. A further reduction of the element count by h-adaptivity by a factor of up to 195 shows the enormous capabilities of the presented parallel-adaptive multigrid based solution scheme. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A software architecture to facilitate large‐scale simulation studies, involving ensembles of long‐running simulations and analysis of vast volumes of output data is described.
Abstract: The main goal of oil reservoir management is to provide more efficient, cost-effective and environmentally safer production of oil from reservoirs. Numerical simulations can aid in the design and implementation of optimal production strategies. However, traditional simulation-based approaches to optimizing reservoir management are rapidly overwhelmed by data volume when large numbers of realizations are sought using detailed geologic descriptions. In this paper, we describe a software architecture to facilitate large-scale simulation studies, involving ensembles of long-running simulations and analysis of vast volumes of output data. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A benchmark comparison between two industry well‐known MOMs—TIBCO Rendezvous (TIB/RV) and SonicMQ is presented to provide an unbiased benchmark reference to the middleware selection process.
Abstract: Message‐oriented middleware (MOM) has become a vital part of the complex application integration projects. MOM is used to pass data and workflow in the form of messages between different enterprise applications. The performance of integrated applications greatly depends on how effectively the MOM performs. This paper presents a benchmark comparison between two industry well‐known MOMs—TIBCO Rendezvous (TIB/RV) and SonicMQ. Although the two MOMs are very similar in certain respects, their native implementation and architecture are very different. We provide an unbiased benchmark reference to the middleware selection process. The primary objective of our work is to evaluate and compare the MOMs by testing their effectiveness in the delivery of messages in publish/subscribe and point‐to‐point message domains, their program stability and the system resource utilization. Copyright © 2005 John Wiley & Sons, Ltd.