Showing papers in "Concurrency and Computation: Practice and Experience in 2003"

PDF

Open Access

Journal Article•DOI•

The LINPACK Benchmark: past, present and future

[...]

Jack Dongarra¹, Piotr Luszczek¹, A. Petitet²•Institutions (2)

University of Tennessee¹, Sun Microsystems²

10 Aug 2003-Concurrency and Computation: Practice and Experience

TL;DR: Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented and information is given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process.

...read moreread less

Abstract: SUMMARY This paper describes the LINPACK Benchmark and some of its variations commonly used to assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented. The latter is frequently used to obtained results for TOP500 submissions. Information is also given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process. Copyright c � 2003 John Wiley & Sons, Ltd.

...read moreread less

787 citations

Journal Article•DOI•

Framework for testing multi‐threaded Java programs

[...]

Orit Edelstein¹, Eitan Farchi¹, Evgeny Goldin¹, Yarden Nir¹, Gil Ratsaby¹, Shmuel Ur¹ - Show less +2 more•Institutions (1)

IBM¹

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: This work presents a methodology for testing multi‐threaded programs which has minimal impact on the user and is likely to find interleaving bugs, and reruns existing tests in order to detect synchronization faults.

...read moreread less

Abstract: Finding bugs due to race conditions in multi-threaded programs is difficult, mainly because there are many possible interleavings, any of which may contain a fault. In this work we present a methodology for testing multi-threaded programs which has minimal impact on the user and is likely to find interleaving bugs. Our method reruns existing tests in order to detect synchronization faults. We find that a single test executed a number of times in a controlled environment may be as effective in finding synchronization faults as many different tests. A great deal of resources are saved since tests are very expensive to write and maintain. We observe that simply rerunning tests, without ensuring in some way that the interleaving will change, yields almost no benefits. We implement the methodology in our test generation tool—ConTest. ConTest combines the replay algorithm, which is essential for debugging, with our interleaving test generation heuristics. ConTest also contains an instrumentation engine, a coverage analyzer, and a race detector (not finished yet) that enhance bug detection capabilities. The greatest advantage of ConTest, besides finding bugs of course, is its minimal effect on the user. When ConTest is combined into the test harness, the user may not even be aware that ConTest is being used. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

149 citations

Journal Article•DOI•

NPACI Rocks: tools and techniques for easily deploying manageable Linux clusters

[...]

Philip M. Papadopoulos¹, Mason J. Katz¹, Greg Bruno¹•Institutions (1)

University of California, San Diego¹

01 Jun 2003-Concurrency and Computation: Practice and Experience

TL;DR: The NPACI Rocks distribution takes a fresh perspective on management and installation of clusters to dramatically simplify software version tracking and cluster integration.

...read moreread less

Abstract: High-performance computing clusters (commodity hardware with low-latency, high-bandwidth interconnects) based on Linux are rapidly becoming the dominant computing platform for a wide range of scientific disciplines. Yet, straightforward software installation, maintenance, and health monitoring for large-scale clusters has been a consistent and nagging problem for non-cluster experts. The NPACI Rocks distribution takes a fresh perspective on management and installation of clusters to dramatically simplify software version tracking and cluster integration. NPACI Rocks incorporates the latest Red Hat distribution (including security patches) with additional cluster-specific software. Using the identical software tools used to create the base distribution, users can customize and localize Rocks for their site. Strong adherence to widely-used (de facto) tools allows Rocks to move with the rapid pace of Linux development. Version 2.2.1 of the toolkit is available for download and installation. Over 100 Rocks clusters have been built by non-cluster experts at multiple institutions (residing in various countries) providing a peak aggregate of 2 TFLOPS of clustered computing. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

141 citations

Journal Article•DOI•

The Virtual Laboratory: a toolset to enable distributed molecular modelling for drug design on the World‐Wide Grid

[...]

Rajkumar Buyya¹, Kim Branson², J. Giddy³, David Abramson⁴•Institutions (4)

University of Melbourne¹, Walter and Eliza Hall Institute of Medical Research², Cardiff University³, Monash University, Caulfield campus⁴

01 Jan 2003-Concurrency and Computation: Practice and Experience

TL;DR: The development of a Virtual Laboratory environment by leveraging existing Grid technologies to enable molecular modelling for drug design on geographically distributed resources and developed new tools for enabling access to ligand records/molecules in the CDB from remote resources.

...read moreread less

Abstract: SUMMARY Computational Grids are emerging as a new paradigm for sharing and aggregation of geographically distributed resources for solving large-scale compute and data intensive problems in science, engineering and commerce. However, application development, resource management and scheduling in these environments is a complex undertaking. In this paper, we illustrate the development of a Virtual Laboratory environment by leveraging existing Grid technologies to enable molecular modelling for drug design on geographically distributed resources. It involves screening millions of compounds in the chemical database (CDB) against a protein target to identify those with potential use for drug design. We have used the Nimrod-G parameter specification language to transform the existing molecular docking application into a parameter sweep application for executing on distributed systems. We have developed new tools for enabling access to ligand records/molecules in the CDB from remote resources. The Nimrod-G resource broker along with molecule CDB data broker is used for scheduling and on-demand processing of docking jobs on the World-Wide Grid (WWG) resources. The results demonstrate the ease of use and power of the Nimrod-G and virtual laboratory tools for grid computing. Copyright c � 2003 John Wiley & Sons, Ltd.

...read moreread less

125 citations

Journal Article•DOI•

MPI-CHECK: a tool for checking Fortran 90 MPI programs

[...]

Glenn R. Luecke¹, Hua Chen¹, James Coyle¹, Jim Hoekstra¹, Marina Kraeva¹, Yan Zou¹ - Show less +2 more•Institutions (1)

Iowa State University¹

01 Feb 2003-Concurrency and Computation: Practice and Experience

TL;DR: MPI‐CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77, and provides automatic compile‐time and run‐time checking ofMPI programs.

...read moreread less

Abstract: SUMMARY MPI is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. MPI-CHECK provides automatic compile-time and run-time checking of MPI programs. MPI-CHECK automatically detects the following problems in the use of MPI routines: (i) mismatch in argument type, kind, rank or number; (ii) messages which exceed the bounds of the source/destination array; (iii) negative message lengths; (iv) illegal MPI calls before MPI INIT or after MPI FINALIZE; (v) inconsistencies between the declared type of a message and its associated DATATYPE argument; and (vi) actual arguments which violate the INTENT attribute. Copyright c � 2003 John Wiley &S ons, Ltd.

...read moreread less

86 citations

Journal Article•DOI•

Modular specification of frame properties in JML

[...]

Peter Müller¹, Arnd Poetzsch-Heffter, Gary T. Leavens²•Institutions (2)

Deutsche Bank¹, Iowa State University²

01 Feb 2003-Concurrency and Computation: Practice and Experience

TL;DR: A modular specification technique for frame properties that uses modifies clauses and abstract fields with declared dependencies to guarantee modularity in the Java Modeling Language, JML.

...read moreread less

Abstract: We present a modular specification technique for frame properties. The technique uses modifies clauses and abstract fields with declared dependencies. Modularity is guaranteed by a programming model that enforces data abstraction by preventing representation and argument exposure, a semantics of modifies clauses that uses a notion of ‘relevant location’, and by modularity rules for dependencies. For concreteness, we adapt this technique to the Java Modeling Language, JML. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

78 citations

Journal Article•DOI•

Benchmarking Java against C and Fortran for scientific applications

[...]

J. M. Bull¹, Lorna Smith¹, C. Ball¹, L. Pottage¹, Robin Freeman¹ - Show less +1 more•Institutions (1)

University of Edinburgh¹

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: A subset of the Java Grande benchmarks has been re‐written in C and Fortran allowing direct performance comparisons between the three languages.

...read moreread less

Abstract: Increasing interest is being shown in the use of Java for scientific applications. The Java Grande benchmark suite was designed with such applications primarily in mind. The perceived lack of performance of Java still deters many potential users, despite recent advances in just-in-time and adaptive compilers. There are, however, few benchmark results available comparing Java to more traditional languages such as C and Fortran. To address this issue, a subset of the Java Grande benchmarks has been re-written in C and Fortran allowing direct performance comparisons between the three languages. The performance of a range of Java execution environments, C and Fortran compilers have been tested across a number of platforms using the suite. These demonstrate that on some platforms (notable Intel Pentium) the performance gap is now quite small. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

70 citations

Journal Article•

Sapphire: Copying GC Without Stopping theWorld

[...]

Eliot B. Moss

01 Jan 2003-Concurrency and Computation: Practice and Experience

TL;DR: The key innovations of Sapphire are the ability to “flip” one thread at a time (changing the thread's view from the old copies of objects to the new copies), as opposed to needing to stop all threads and flip them at the same time; and (2) avoiding a read barrier.

...read moreread less

57 citations

Journal Article•DOI•

Supporting dynamic parallel object arrays

[...]

Orion Sky Lawlor¹, Laxmikant V. Kale¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: Support for message delivery and collective operations in the face of dynamic behavior of generalized arrays of parallel data driven objects, which can migrate from processor to processor at any time, is discussed.

...read moreread less

Abstract: We present efficient support for generalized arrays of parallel data driven objects. Array elements are regular C++ objects, and are scattered across the parallel machine. An individual element is addressed by its ‘index’, which can be an arbitrary object rather than a simple integer. For example, an array index can be a series of numbers, supporting multidimensional sparse arrays; a bit vector, supporting collections of quadtree nodes; or a string. Methods can be invoked on any individual array element from any processor, and the elements can participate in reductions and broadcasts. Individual elements can be created or deleted dynamically at any time. Most importantly, the elements can migrate from processor to processor at any time. We discuss support for message delivery and collective operations in the face of such dynamic behavior. The migration capabilities of array elements have proven extremely useful, for example, in implementing flexible load balancing strategies and for exploiting workstation clusters adaptively. We present the design, an implementation, and performance results. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

41 citations

Journal Article•DOI•

Clusterfile: a flexible physical layout parallel file system

[...]

Florin Isaila¹, Walter F. Tichy¹•Institutions (1)

Karlsruhe Institute of Technology¹

01 Jun 2003-Concurrency and Computation: Practice and Experience

TL;DR: The paper shows how the file model can be employed for file partitioning into both physical subfiles and logical views and how the conversion between two partitions of the same file is implemented using a general memory redistribution algorithm.

...read moreread less

Abstract: This paper presents Clusterfile, a parallel file system that provides parallel file access on a cluster of computers. We introduce a file partitioning model that has been used in the design of Clusterfile. The model uses a data representation that is optimized for multidimensional array partitioning while allowing arbitrary partitions. The paper shows how the file model can be employed for file partitioning into both physical subfiles and logical views. We also present how the conversion between two partitions of the same file is implemented using a general memory redistribution algorithm. We show how we use the algorithm to optimize non-contiguous read and write operations. The experimental results include performance comparisons with the Parallel Virtual File System (PVFS) and an MPI-IO implementation for PVFS. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

40 citations

Journal Article•DOI•

SCALEA: a performance analysis tool for parallel programs

[...]

Hong-Linh Truong¹, Thomas Fahringer¹•Institutions (1)

University of Vienna¹

01 Sep 2003-Concurrency and Computation: Practice and Experience

TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and testing performance analysis tools for code regions and performance metrics of interest.

...read moreread less

Abstract: Many existing performance analysis tools lack the flexibility to control instrumentation and performance measurement for code regions and performance metrics of interest. Performance analysis is commonly restricted to single experiments. In this paper we present SCALEA, which is a performance instrumentation, measurement, analysis, and visualization tool for parallel programs that supports post-mortem performance analysis. SCALEA currently focuses on performance analysis for OpenMP, MPI, HPF, and mixed parallel programs. It computes a variety of performance metrics based on a novel classification of overhead. SCALEA also supports multi-experiment performance analysis that allows one to compare and to evaluate the performance outcome of several experiments. A highly flexible instrumentation and measurement system is provided which can be controlled by command-line options and program directives. SCALEA can be interfaced by external tools through the provision of a full Fortran90 OpenMP/MPI/HPF frontend that allows one to instrument an abstract syntax tree at a very high-level with C-function calls and to generate source code. A graphical user interface is provided to view a large variety of performance metrics at the level of arbitrary code regions, threads, processes, and computational nodes for single- and multi-experiment performance analysis. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Weak communication in single‐hop radio networks: adjusting algorithms to industrial standards

[...]

Tomasz Jurdzinski, Mirosław Kutyłowski¹, Jan Zatopiański¹•Institutions (1)

Wrocław University of Technology¹

01 Sep 2003-Concurrency and Computation: Practice and Experience

TL;DR: Quite often algorithms designed for no‐collision‐detection radio networks use a hidden form of collision detection: it is assumed that a station can simultaneously send and listen.

...read moreread less

Abstract: Quite often algorithms designed for no-collision-detection radio networks use a hidden form of collision detection: it is assumed that a station can simultaneously send and listen. If it cannot hear its own message, apparently the message has been scrambled by another station sending at the same time. Industrial standard IEEE 802.11 says that a station can either send or listen to a radio channel at a given time, but not both. In order to relate the industrial standard and theoretical algorithms we consider a weak radio network model with no collision detection in which a station cannot simultaneously send and receive signals. Otherwise we talk about a strong model. In this paper we consider a measure called energy cost (or ‘power consumption’) which is equal to the maximum over all stations of the number of steps in which the station is sending or listening. We show that computational power of weak and strong single-hop radio networks differ substantially in the deterministic case: deterministic leader election requires energy cost in the weak model and can be solved by a practical algorithm with energy cost in the strong model. By contrast, we present a very efficient randomized simulation of strong radio networks by weak ones, with preprocessing that requires steps and has energy cost . Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Run‐time optimizations for a Java DSM implementation

[...]

Ronald Veldema¹, Rutger F. H. Hofman¹, Raoul Bhoedjang², Henri E. Bal¹•Institutions (2)

VU University Amsterdam¹, Cornell University²

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: Jackal is a fine‐grained distributed shared memory implementation of the Java programming language that allows multithreaded Java programs to run unmodified on distributed‐memory systems.

...read moreread less

Abstract: Jackal is a fine‐grained distributed shared memory implementation of the Java programming language. Jackal implements Java's memory model and allows multithreaded Java programs to run unmodified on distributed‐memory systems.

...read moreread less

Journal Article•DOI•

CCJ: Object-based Message Passing and Collective Communication in Java

[...]

Arnold Nelisse¹, Jason Maassen¹, Thilo Kielmann¹, Henri E. Bal¹•Institutions (1)

VU University Amsterdam¹

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: The results show that neither CCJ's object‐oriented design nor its implementation on top of RMI impose a performance penalty on applications compared to their mpiJava counterparts.

...read moreread less

Abstract: CCJ is a communication library that adds MPI-like message passing and collective operations to Java. Rather than trying to adhere to the precise MPI syntax, CCJ aims at a clean integration of communication into Java's object-oriented framework. For example, CCJ uses thread groups to support Java's multithreading model and it allows any data structure (not just arrays) to be communicated. CCJ is implemented entirely in Java, on top of RMI, so it can be used with any Java virtual machine. The paper discusses three parallel Java applications that use collective communication. It compares the performance (on top of a Myrinet cluster) of CCJ, RMI and mpiJava versions of these applications and also compares their code complexity. A detailed performance comparison between CCJ and mpiJava is given using the Java Grande Forum MPJ benchmark suite. The results show that neither CCJ's object-oriented design nor its implementation on top of RMI impose a performance penalty on applications compared to their mpiJava counterparts. The source of CCJ is available from our Web site http://www.cs.vu.nl/manta. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Java Virtual Machine support for object serialization

[...]

Fabian Breg¹, Constantine D. Polychronopoulos¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: A subset of Java's object serialization protocol in native code is implemented, using the Java Native Interface (JNI) and JVM internals, and it is shown that this approach is up to eight times faster than Java's original objectserialization protocol for array objects.

...read moreread less

Abstract: Distributed computing has become increasingly popular in the high-performance community. Java's remote method invocation (RMI) provides a simple, yet powerful method for implementing parallel algorithms. The performance of RMI has been less than adequate, however, and object serialization is often identified as a major performance inhibitor. We believe that object serialization is best performed in the Java Virtual Machine (JVM), where information regarding object layout and hardware communication resources are readily available. We implement a subset of Java's object serialization protocol in native code, using the Java Native Interface (JNI) and JVM internals. Experiments show that our approach is up to eight times faster than Java's original object serialization protocol for array objects. Also, for linked data structures our approach obtains a moderate speedup and better scalability. Evaluation of our object serialization implementation in an RMI framework indicates that a higher throughput can be obtained. Parallel applications, written using RMI, obtain better speedups and scalability when this more efficient object serialization is used. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

A distributed object infrastructure for interaction and steering

[...]

Rajeev Muralidhar¹, Manish Parashar¹•Institutions (1)

Rutgers University¹

25 Aug 2003-Concurrency and Computation: Practice and Experience

TL;DR: The design, implementation and experimental evaluation of DIOS (Distributed Interactive Object Substrate), an interactive object infrastructure to enable the runtime monitoring, interaction and computational steering of parallel and distributed applications, are presented.

...read moreread less

Abstract: This paper presents the design, implementation and experimental evaluation of DIOS (Distributed Interactive Object Substrate), an interactive object infrastructure to enable the runtime monitoring, interaction and computational steering of parallel and distributed applications. DIOS enables application objects (data structures, algorithms) to be enhanced with sensors and actuators so that they can be interrogated and controlled. Application objects may be distributed (spanning many processors) and dynamic (be created, deleted, changed or migrated). Furthermore, DIOS provides a control network that interconnects the interactive objects in a parallel/distributed application and enables external discovery, interrogation, monitoring and manipulation of these objects at runtime. DIOS is currently being used to enable interactive visualization, monitoring and steering of a wide range of scientific applications, including oil reservoir, compressible turbulence and numerical relativity simulations. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Spar: a set of extensions to Java for scientific computation

[...]

C. van Reeuwijk¹, F. Kuijlman¹, Henk Sips¹•Institutions (1)

Delft University of Technology¹

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: A set of language extensions that improve the expressiveness and performance of Java for scientific computation, including tuples, parameterized types, array subscript overloading, and the inline modifier are presented.

...read moreread less

Abstract: We present a set of language extensions that improve the expressiveness and performance of Java for scientific computation. The language extensions allow the manipulation of multi-dimensional arrays to be expressed more naturally, and to be implemented more efficiently. Furthermore, data-parallel programming is supported, allowing efficient parallelization of a large class of operations on arrays. We also provide language extensions to construct specialized array representations, such as symmetric, block, and sparse matrices. These extensions are: tuples, parameterized types, array subscript overloading, and the inline modifier. These extensions are not only useful in the construction of special array representations, but are also useful in their own right. Finally, we add complex numbers as a primitive type to the language. We evaluate our language extensions using performance results. We also compare relevant code fragments of our extended language with standard Java implementations and language extensions proposed by others. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Sapphire: copying garbage collection without stopping the world

[...]

Richard L. Hudson¹, Richard L. Hudson², J. Eliot B. Moss¹•Institutions (2)

University of Massachusetts Amherst¹, Intel²

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: Sapphire is a new algorithm for concurrent copying GC for Java that stresses minimizing the amount of time any given application thread may need to block to support the collector.

...read moreread less

Abstract: The growing use in concurrent systems of languages that require garbage collection (GC), such as Java, is raising practical interest in concurrent GC. Sapphire is a new algorithm for concurrent copying GC for Java. It stresses minimizing the amount of time any given application thread may need to block to support the collector. In particular, Sapphire is intended to work well in the presence of a large number of application threads, on small‐ to medium‐scale shared memory multiprocessors.

...read moreread less

Journal Article•DOI•

Achieving portable and efficient parallel CORBA objects

[...]

Alexandre Denis, Christian Pérez¹, Thierry Priol¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

25 Aug 2003-Concurrency and Computation: Practice and Experience

TL;DR: This work aims at bringing single program multiple data (SPMD) programming into CORBA in a portable way, and shows that portable parallel CORBA objects can efficiently make use of high‐performance networks.

...read moreread less

Abstract: With the availability of Computational Grids, new kinds of applications are emerging. They raise the problem of how to program them on such computing systems. In this paper, we advocate a programming model based on a combination of parallel and distributed programming models. Compared to previous approaches, this work aims at bringing SPMD programming into CORBA in a portable way. For example, we want to interconnect two parallel codes by CORBA without modifying either CORBA or the parallel communication API. We show that such an approach does not entail any loss of performance compared to previous approaches that required modification to the CORBA standard. Moreover, using an ORB that is able to exploit high performance networks, we show that portable parallel CORBA objects can efficiently make use of such networks.

...read moreread less

Journal Article•DOI•

CORBA request portable interceptors: Analysis and applications

[...]

Roberto Baldoni¹, Carlo Marchetti¹, Luigi Verde¹•Institutions (1)

Sapienza University of Rome¹

01 May 2003-Concurrency and Computation: Practice and Experience

TL;DR: In this paper, the authors report the results of their experiences with CORBA request portable interceptors and propose a proxy-based technique to overcome the interceptors' limitations, and conclude their work with a case study in which portable interceptor are used to implement the fault-tolerant CORBA client invocation semantic without impacting on the client application code and on the CORBA ORB.

...read moreread less

Abstract: Interceptors are an emerging middleware technology enabling the addition of specific network-oriented capabilities to distributed applications. By exploiting interceptors, developers can register code within interception points, extending the basic middleware mechanisms with specific functionality, e.g. authentication, flow control, caching, etc. Notably, these extensions can be achieved without modifying either the application or the middleware code. In this paper we report the results of our experiences with CORBA request portable interceptors. In particular, we point out (i) the basic mechanisms implementable by these interceptors, i.e. request redirection and piggybacking and (ii) we analyze their limitations. We then propose a proxy-based technique to overcome the interceptors' limitations. Successively, we present a performance analysis carried out on three Java-CORBA platforms currently implementing the portable interceptors specification. Finally, we conclude our work with a case study in which portable interceptors are used to implement the fault-tolerant CORBA client invocation semantic without impacting on the client application code and on the CORBA ORB. We also release fragments of Java code for implementing the described techniques. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

On the optimality of Feautrier's scheduling algorithm

[...]

Frédéric Vivien¹•Institutions (1)

École normale supérieure de Lyon¹

01 Sep 2003-Concurrency and Computation: Practice and Experience

TL;DR: It is shown that Feautrier's scheduling algorithm is not the most powerful existing algorithm for parallelism detection and extraction, and that for an algorithm to find more parallelism than this algorithm, one needs to remove some of the hypotheses underlying its framework.

...read moreread less

Abstract: Feautrier's scheduling algorithm is the most powerful existing algorithm for parallelism detection and extraction, but it has always been known to be suboptimal. However, the question as to whether it may miss some parallelism because of its design has not been answered. We show that this is not the case. Therefore, for an algorithm to find more parallelism than this algorithm, one needs to remove some of the hypotheses underlying its framework. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Kava: a Java dialect with a uniform object model for lightweight classes

[...]

David F. Bacon¹•Institutions (1)

IBM¹

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: Object‐oriented programming languages have always distinguished between ‘primitive’ and ‘user‐defined’ data types, and in the case of languages like C++ and Java the primitives are not even treated as objects, further fragmenting the programming model.

...read moreread less

Abstract: Object-oriented programming languages have always distinguished between ‘primitive’ and ‘user-defined’ data types, and in the case of languages like C++ and Java the primitives are not even treated as objects, further fragmenting the programming model. The distinction is especially problematic when a particular programming community requires primitive-level support for a new data type, as for complex, intervals, fixed-point numbers, and so on. We present Kava, a design for a backward-compatible version of Java that solves the problem of programmable lightweight objects in a much more aggressive and uniform manner than previous proposals. In Kava, there are no primitive types; instead, object-oriented programming is provided down to the level of single bits, and types such as int can be explicitly programmed within the language. While the language maintains a uniform object reference semantics, efficiency is obtained by making heavy use of unboxing and semantic expansion. We describe Kava as a dialect of the Java language, show how it can be used to define various primitive types, describe how it can be translated into Java, and compare it to other approaches to lightweight objects. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Scalable causal message logging for wide‐area environments

[...]

Karan Bhatia¹, Keith Marzullo², Lorenzo Alvisi³•Institutions (3)

University of California, San Diego¹, University of California, Los Angeles², University of Texas at Austin³

25 Aug 2003-Concurrency and Computation: Practice and Experience

TL;DR: Causal message logging protocols spread fault tolerance information around in the system providing high availability, which can also be used to replicate objects that are otherwise inaccessible because of network partitions.

...read moreread less

Abstract: Wide-area systems are gaining in popularity as an infrastructure for running scientific applications. From a fault tolerance perspective, these environments are challenging because of their scale and their variability. Causal message logging protocols have attractive properties that make them suitable for these environments. They spread fault tolerance information around in the system providing high availability. This information can also be used to replicate objects that are otherwise inaccessible because of network partitions. However, current causal message logging protocols do not scale to thousands or millions of processes. We describe the Hierarchical Causal Message Logging Protocol (HCML) that uses a hierarchy of shared logging sites, or proxies, to significantly reduce the space requirements as compared with existing protocols. These proxies also act as caches for fault tolerance information and reduce the overall message overhead by as much as 50%. HCML also leverages differences in bandwidth between processes that reduces overall message latency by as much as 97%. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Dynamic query scheduling in parallel data warehouses

[...]

Holger Märtens¹, Erhard Rahm², Thomas Stöhr²•Institutions (2)

Fachhochschule Braunschweig/Wolfenbüttel¹, Leipzig University²

01 Sep 2003-Concurrency and Computation: Practice and Experience

TL;DR: The specifics of load balancing in parallel data warehouses have not been addressed in detail, so this work addresses this issue in detail.

...read moreread less

Abstract: Parallel processing is a key to high performance in very large data warehouse applications that execute complex analytical queries on huge amounts of data. Although parallel database systems (PDBSs) have been studied extensively in the past decades, the specifics of load balancing in parallel data warehouses have not been addressed in detail. In this study, we investigate how the load balancing potential of a Shared Disk (SD) architecture can be utilized for data warehouse applications. We propose an integrated scheduling strategy that simultaneously considers both processors and disks, regarding not only the total workload on each resource but also the distribution of load over time. We evaluate the performance of the new method in a comprehensive simulation study and compare it to several other approaches. The analysis incorporates skew aspects and considers typical data warehouse features such as star schemas with large fact tables and bitmap indices. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

High‐performance Java codes for computational fluid dynamics

[...]

Christopher J. Riley, Siddhartha Chatterjee¹, Rupak Biswas²•Institutions (2)

IBM¹, Ames Research Center²

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: This paper presents the object‐oriented design and implementation of two real‐world applications from the field of computational fluid dynamics (CFD): a finite‐volume fluid flow solver (LAURA) and an unstructured mesh adaptation algorithm (2D_TAG).

...read moreread less

Abstract: The computational science community is reluctant to write large-scale computationally-intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of Computational Fluid Dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance.

...read moreread less

Journal Article•DOI•

A comparison of concurrent programming and cooperative multithreading

[...]

Aaron W. Keen¹, Takashi Ishihara¹, Justin T. Maris¹, Tiejun Li¹, Eugene F. Fodor¹, Ronald A. Olsson¹ - Show less +2 more•Institutions (1)

University of California, Davis¹

01 Jan 2003-Concurrency and Computation: Practice and Experience

TL;DR: A comparison of the cooperative multithreading model with the general concurrent programming model focuses on the execution time performance of a range of standard concurrent programming applications and examines the tradeoffs in writing programs in the different programming styles.

...read moreread less

Abstract: This paper presents a comparison of the cooperative multithreading model with the general concurrent programming model. It focuses on the execution time performance of a range of standard concurrent programming applications. The overall results are mixed. In some cases, programs written in the cooperative multithreading model outperform those written in the general concurrent programming model. The contributions of this paper are twofold. First, it presents a thorough analysis of the performances of applications in the different models, i.e. to explain the criteria that determine when a program in one model will outperform an equivalent program in the other. Second, it examines the tradeoffs in writing programs in the different programming styles. In some cases, better performance comes at the cost of more complicated code. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Improving the official specification of Java bytecode verification

[...]

Alessandro Coglio

01 Feb 2003-Concurrency and Computation: Practice and Experience

TL;DR: This paper provides a comprehensive analysis of the specification of Bytecode verification, along with concrete suggestions for improvement.

...read moreread less

Abstract: SUMMARY Bytecode verificationis themain mechanismto ensuretypesafety intheJava VirtualMachine.Inadequacies in its official specification may lead to incorrect implementations where security can be broken and/or certain legal programs are rejected. This paper provides a comprehensive analysis of the specification, along with concrete suggestions for improvement. Copyright c � 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Platform independent dynamic Java virtual machine analysis: the Java Grande Forum benchmark suite

[...]

David Gregg¹, James F. Power², John Waldron¹•Institutions (2)

Trinity College, Dublin¹, Maynooth University²

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: A platform independent analysis of the dynamic profiles of Java programs when executing on the Java Virtual Machine is presented, describing the dynamic instruction usage frequencies, as well as the sizes of the local variable, parameter and operand stacks during execution on the JVM.

...read moreread less

Abstract: In this paper we present a platform independent analysis of the dynamic profiles of Java programs when executing on the Java Virtual Machine. The Java programs selected are taken from the Java Grande Forum benchmark suite and five different Java-to-bytecode compilers are analysed. The results presented describe the dynamic instruction usage frequencies, as well as the sizes of the local variable, parameter and operand stacks during execution on the JVM. These results, presenting a picture of the actual (rather than presumed) behaviour of the JVM, have implications both for the coverage aspects of the Java Grande benchmark suites, for the performance of the Java-to-bytecode compilers and for the design of the JVM. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

CartaBlanca—a pure‐Java, component‐based systems simulation tool for coupled nonlinear physics on unstructured grids—an update

[...]

W. B. VanderHeyden¹, Edward D. Dendy¹, N. T. Padial-Collins¹•Institutions (1)

Los Alamos National Laboratory¹

01 Mar 2003-Concurrency and Computation: Practice and Experience

TL;DR: A component‐based nonlinear physical system simulation prototyping package written entirely in Java using object‐oriented design that enables a clean component‐like implementation and the extension of the software to distributed‐memory computer systems.

...read moreread less

Abstract: This paper describes a component-based nonlinear physical system simulation prototyping package written entirely in Java using object-oriented design. The package provides scientists and engineers with a ‘developer-friendly’ software environment for large-scale computational algorithm and physical model development. The software design centers on the Jacobian-free Newton–Krylov solution method surrounding a finite-volume treatment of conservation equations. This enables a clean component-like implementation. We first provide motivation for the development of the software and then discuss software structure. The discussion includes a description of the use of Java's built-in thread facility that enables parallel, shared-memory computations on a wide variety of unstructured grids with triangular, quadrilateral, tetrahedral and hexahedral elements. We also discuss the use of Java's inheritance mechanism in the construction of a hierarchy of physics systems objects and linear and nonlinear solver objects that simplify development and foster software re-use. We provide a brief review of the Jacobian-free Newton–Krylov nonlinear system solution method and discuss how it fits into our design. Following this, we show results from example calculations and then discuss plans including the extension of the software to distributed-memory computer systems. Copyright © 2003 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Clustering revealed in high‐resolution simulations and visualization of multi‐resolution features in fluid–particle models

[...]

Krzysztof Boryczko, Witold Dzwinel, David A. Yuen¹•Institutions (1)

University of Minnesota¹

01 Feb 2003-Concurrency and Computation: Practice and Experience

TL;DR: An agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN), that can be easily adapted for efficient visualization of extremely large data sets from simulations with particles at various resolution levels is developed.

...read moreread less

Abstract: SUMMARY Simulating natural phenomena at greater accuracy results in an explosive growth of data. Large-scale simulations with particles currently involve ensembles consisting of between 10 6 and 10 9 particles, which cover 10 5 ‐10 6 time steps. Thus, the data files produced in a single run can reach from tens of gigabytes to hundreds of terabytes. This data bank allows one to reconstruct the spatio-temporal evolution of both the particle system as a whole and each particle separately. Realistically, for one to look at a large data set at full resolution at all times is not possible and, in fact, not necessary. We have developed an agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN). This procedure can be easily adapted for efficient visualization of extremely large data sets from simulations with particles at various resolution levels. We present the parallel algorithm for MNN clustering and its timings on the IBM SP and SGI/Origin 3800 multiprocessor systems for up to 16 million fluid particles. The high efficiency obtained is mainly due to the similarity in the algorithmic structure of MNN clustering and particle methods. We show various examples drawn from MNN applications in visualization and analysis of the order of a few hundred gigabytes of data from discrete particle simulations, using dissipative particle dynamics and fluid particle models. Because data clustering is the first step in this concept extraction procedure, we may employ this clustering procedure to many other fields such as data mining, earthquake events and stellar populations in nebula clusters. Copyright c � 2003 John Wiley & Sons, Ltd.

...read moreread less