Showing papers in "Concurrency and Computation: Practice and Experience in 1999"

PDF

Open Access

Journal Article•DOI•

Engineering parallel symbolic programs in GPH

[...]

Hans-Wolfgang Loidl¹, Philip W. Trinder¹, Kevin Hammond², Sahalu B. Junaidu³, Richard G. Morgan⁴, Simon Jones⁵ - Show less +2 more•Institutions (5)

Heriot-Watt University¹, University of St Andrews², King Fahd University of Petroleum and Minerals³, Durham University⁴, Microsoft⁵

01 Oct 1999-Concurrency and Computation: Practice and Experience

TL;DR: This investigation investigates the claim that functional languages offer low-cost parallelism in the context of symbolic programs on modest parallel architectures and presents the first comparative study of the construction of large applications in a parallel functional language, in this case in Glasgow Parallel Haskell (GPH).

...read moreread less

Abstract: We investigate the claim that functional languages offer low-cost parallelism in the context of symbolic programs on modest parallel architectures. In our investigation we present the first comparative study of the construction of large applications in a parallel functional language, in our case in Glasgow Parallel Haskell (GPH). The applications cover a range of application areas, use several parallel programming paradigms, and are measured on two very different parallel architectures. On the applications level the most significant result is that we are able to achieve modest wall-clock speedups (between factors of 2 and 10) over the optimised sequential versions for all but one of the programs. Speedups are obtained even for programs that were not written with the intention of being parallelised. These gains are achieved with a relatively small programmer-effort. One reason for the relative ease of parallelisation is the use of evaluation strategies, a new parallel programming technique that separates the algorithm from the co-ordination of parallel behaviour. On the language level we show that the combination of lazy and parallel evaluation is useful for achieving a high level of abstraction. In particular we can describe top-level parallelism, and also preserve module abstraction by describing parallelism over the data structures provided at the module interface (‘data-oriented parallelism’). Furthermore, we find that the determinism of the language is helpful, as is the largely implicit nature of parallelism in GPH. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

38 citations

Journal Article•DOI•

Prophet: automated scheduling of SPMD programs in workstation networks

[...]

Jon Weissman¹•Institutions (1)

University of Texas at San Antonio¹

01 May 1999-Concurrency and Computation: Practice and Experience

33 citations

Journal Article•DOI•

Parallel execution of embedded and iterated Runge–Kutta methods

[...]

Thomas Rauber¹, Gudula Rünger²•Institutions (2)

Martin Luther University of Halle-Wittenberg¹, Leipzig University²

01 Jun 1999-Concurrency and Computation: Practice and Experience

30 citations

Journal Article•DOI•

Multiresolution load balancing in curved space: the wavelet representation

[...]

Aiichiro Nakano¹•Institutions (1)

Louisiana State University¹

01 Jun 1999-Concurrency and Computation: Practice and Experience

29 citations

Journal Article•DOI•

Managing Clusters of Geographically Distributed High-Performance Computers

[...]

Matthias Brune¹, Jörn Gehring, Axel Keller, Alexander Reinefeld¹•Institutions (1)

Zuse Institute Berlin¹

25 Dec 1999-Concurrency and Computation: Practice and Experience

TL;DR: A software system for the management of geographically distributed high‐performance computers and co‐ordinates the co‐operative use of resources in autonomous computing sites.

...read moreread less

Abstract: We present a software system for the management of geographically distributed high‐performance computers. It consists of three components: 1. The Computing Center Software (CCS) is a vendor‐independent resource management software for local HPC systems. It controls the mapping and scheduling of interactive and batch jobs on massively parallel systems; 2. The Resource and Service Description (RSD) is used by CCS for specifying and mapping hardware and software components of (meta‐)computing environments. It has a graphical user interface, a textual representation and an object‐oriented API; 3. The Service Coordination Layer (SCL) co‐ordinates the co‐operative use of resources in autonomous computing sites. It negotiates between the applications' requirements and the available system services.

...read moreread less

26 citations

Journal Article•DOI•

Efficient implementation of a portable parallel programming model for image processing

[...]

Philip Morrow¹, Danny Crookes², T. John Brown², G. McAleese¹, D. K. Roantree¹, Ivor Spence² - Show less +2 more•Institutions (2)

Ulster University¹, Queen's University Belfast²

01 Sep 1999-Concurrency and Computation: Practice and Experience

TL;DR: Efficiency is achieved by the concept of a self-optimising class library of primitive image processing operations, which allows programs to be written in a high level, algebraic notation and which is automatically parallelised (using an application-specific data parallel approach).

...read moreread less

Abstract: This paper describes a domain specific programming model for execution on parallel and distributed architectures. The model has initially been targeted at the application area of image processing, though the techniques developed may be more generally applicable to other domains where an algebraic or library-based approach is common. Efficiency is achieved by the concept of a self-optimising class library of primitive image processing operations, which allows programs to be written in a high level, algebraic notation and which is automatically parallelised (using an application-specific data parallel approach). The class library is extended automatically with optimised operations, generated by a transformation system, giving improved execution performance. The parallel implementation of the model described here is based on MPI and has been tested on a C40 processor network, a quad-processor Unix workstation, and a network of PCs running Linux. Timings are included to indicate the impact of the automatic optimisation facility (rather than the effect of parallelisation). Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

20 citations

Journal Article•DOI•

Evaluating the communications capabilities of the generalized hypercube interconnection network

[...]

Sotirios G. Ziavras¹, Sanjay Krishnamurthy¹•Institutions (1)

New Jersey Institute of Technology¹

01 May 1999-Concurrency and Computation: Practice and Experience

TL;DR: A significant improvement in the total execution time and a reduction in the number of message contentions are illustrated, and it is proved that the generalized hypercube is a very versatile interconnection network.

...read moreread less

Abstract: SUMMARY This paper presents results of evaluating the communications capabilities of the generalized hypercube interconnection network. The generalized hypercube has outstanding topological properties, but it has not been implemented on a large scale because of its very high wiring complexity. For this reason, this network has not been studied extensively in the past. However, recent and expected technological advancements will soon render this network viable for massively parallel systems. We first present implementations of randomized manyto-all broadcasting and multicasting on generalized hypercubes, using as the basis the oneto-all broadcast algorithm presented by Fragopoulou et al. (1996). We test the proposed implementations under realistic communication traffic patterns and message generations, for the all-port model of communication. Our results show that the size of the intermediate message buffers has a significant effect on the total communication time, and this effect becomes very dramatic for large systems with large numbers of dimensions. We also propose a modification of this multicast algorithm that applies congestion control to improve its performance. The results illustrate a significant improvement in the total execution time and a reduction in the number of message contentions, and also prove that the generalized hypercube is a very versatile interconnection network. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

18 citations

Journal Article•DOI•

Predicting the execution time of message passing models

[...]

José L. Roda¹, Casiano Rodríguez¹, Domingo Morales¹, Francisco Almeida¹•Institutions (1)

University of La Laguna¹

10 Aug 1999-Concurrency and Computation: Practice and Experience

16 citations

Journal Article•DOI•

COLTHPF, a run‐time support for the high‐level co‐ordination of HPF tasks

[...]

Salvatore Orlando¹, Raffaele Perego•Institutions (1)

Ca' Foscari University of Venice¹

01 Jul 1999-Concurrency and Computation: Practice and Experience

14 citations

Journal Article•DOI•

Parallel unstructured tetrahedral mesh adaptation: Algorithms, implementation and scalability

[...]

Paul M. Selwood¹, Martin Berzins¹•Institutions (1)

University of Leeds¹

10 Dec 1999-Concurrency and Computation: Practice and Experience

TL;DR: The use of unstructured adaptive tetrahedral meshes in the solution of transient flows poses a challenge for parallel computing due to the irregular and frequently changing nature of the data and its distribution, and a parallel mesh adaptation algorithm, PTETRAD, is described and analysed.

...read moreread less

Abstract: The use of unstructured adaptive tetrahedral meshes in the solution of transient flows poses a challenge for parallel computing due to the irregular and frequently changing nature of the data and its distribution. A parallel mesh adaptation algorithm, PTETRAD, for unstructured tetrahedral meshes (based on the serial code TETRAD) is described and analysed. The portable implementation of the parallel code in C with MPI is described and discussed. The scalability of the code is considered, analysed and illustrated by numerical experiments using a shock wave diffraction problem.

...read moreread less

13 citations

Journal Article•DOI•

Synchronization transformations for parallel computing

[...]

Pedro C. Diniz¹, Martin Rinard²•Institutions (2)

University of Southern California¹, Massachusetts Institute of Technology²

01 Nov 1999-Concurrency and Computation: Practice and Experience

TL;DR: In this paper, a new framework for synchronization optimizations and a new set of transformations for programs that implement critical sections using mutual exclusion locks are described, which allows the compiler to move constructs that acquire and release locks both within and between procedures and to eliminate acquire/release constructs.

...read moreread less

Abstract: As parallel machines become part of the mainstream computing environment, compilers will need to apply synchronization optimizations to deliver efficient parallel software. This paper describes a new framework for synchronization optimizations and a new set of transformations for programs that implement critical sections using mutual exclusion locks. These transformations allow the compiler to move constructs that acquire and release locks both within and between procedures and to eliminate acquire and release constructs.The paper also presents a new synchronization algorithm, lock elimination, for reducing synchronization overhead. This optimization locates computations that repeatedly acquire and release the same lock, then uses the transformations to obtain equivalent computations that acquire and release the lock only once. Experimental results from a parallelizing compiler for object-based programs illustrate the practical utility of this optimization. For three benchmark programs the optimization dramatically reduces the number of times the computations acquire and release locks, which significantly reduces the amount of time processors spend acquiring and releasing locks. For one of the three benchmarks, the optimization always significantly improves the overall performance. Depending on the number of processors executing the computation, the optimized version runs between 2.11 and 1.83 times faster than the unoptimized version. For one of the other benchmarks, the optimized version runs between 1.13 and 0.96 times faster than the unoptimized version, with a mean of 1.08 times faster. For the final benchmark, the optimization reduces the overall performance.

...read moreread less

Journal Article•DOI•

A concurrent process creation service to support SPMD based parallel processing on COWs

[...]

Michael Hobbs¹, Andrzej Goscinski¹•Institutions (1)

Deakin University¹

01 Nov 1999-Concurrency and Computation: Practice and Experience

Journal Article•DOI•

Tiling on systems with communication/computation overlap

[...]

Pierre-Yves Calland¹, Jack Dongarra², Jack Dongarra³, Yves Robert², Yves Robert⁴ - Show less +1 more•Institutions (4)

École normale supérieure de Lyon¹, University of Tennessee², Oak Ridge National Laboratory³, University of Split⁴

01 Mar 1999-Concurrency and Computation: Practice and Experience

TL;DR: This work derives the optimal mapping and scheduling of tiles to physical processors under some reasonable assumptions, under the context of limited computational resources and assuming communication‐computation overlap.

...read moreread less

Abstract: SUMMARY In the framework of fully permutable loops, tiling is a compiler technique (also known as ‘loop blocking’) that has been extensively studied as a source-to-source program transformation. Little work has been devoted to the mapping and scheduling of the tiles on to physical parallel processors. We present several new results in the context of limited computational resources and assuming communication‐computation overlap. In particular, under some reasonable assumptions, we derive the optimal mapping and scheduling of tiles to physical processors. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

A loosely coupled metacomputer: co‐operating job submissions across multiple supercomputing sites

[...]

Gregor von Laszewski¹•Institutions (1)

Argonne National Laboratory¹

25 Dec 1999-Concurrency and Computation: Practice and Experience

Journal Article•DOI•

SCIPVM: Parallel distributed computing on SCI workstation clusters

[...]

Ivan Zoraja¹, Hermann Hellwagner², Vaidy S. Sunderam³•Institutions (3)

University of Split¹, Technische Universität München², Emory University³

01 Mar 1999-Concurrency and Computation: Practice and Experience

Journal Article•DOI•

Predictable communication on unpredictable networks: implementing BSP over TCP/IP and UDP/IP

[...]

Stephen R. Donaldson, Jonathan M. D. Hill, David B. Skillicorn¹•Institutions (1)

Queen's University¹

01 Sep 1999-Concurrency and Computation: Practice and Experience

TL;DR: This work presents a technique for controlling applied communication load that achieves high communication throughput and minimise its variance in networks of workstations.

...read moreread less

Abstract: The BSP cost model measures the cost of communication using a single architectural parameter, g, which measures permeability of the network to continuous traffic. Architectures such as networks of workstations pose particular problems for high-performance communication because it is hard to achieve high communication throughput, and even harder to do so predictably. Yet both of these are required for BSP to be effective. We present a technique for controlling applied communication load that achieves both. Traffic is presented to the communication network at a rate chosen to maximise throughput and minimise its variance. Significant performance improvements can be achieved compared to unstructured communication over the same transport protocols as in the case of, for example, MPI. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Seamless computing with WebSubmit

[...]

Ryan P McCormack¹, John E Koontz¹, J E. Devaney¹•Institutions (1)

National Institute of Standards and Technology¹

25 Dec 1999-Concurrency and Computation: Practice and Experience

Journal Article•DOI•

Distributed control parallelism in multidisciplinary aircraft design

[...]

Denitza T. Krasteva¹, Layne T. Watson¹, Chuck A. Baker¹, Bernard Grossman¹, William H. Mason¹, Raphael T. Haftka² - Show less +2 more•Institutions (2)

Virginia Tech¹, University of Florida²

01 Jul 1999-Concurrency and Computation: Practice and Experience

TL;DR: This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems.

...read moreread less

Abstract: Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuraton optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.

...read moreread less

Journal Article•DOI•

An analytical tool for predicting the performance of parallel relational databases

[...]

M. H. Williams¹, Euan W. Dempster¹, Neven Tomov¹, C. S. Pua¹, Hamish Taylor¹, Albert Burger¹, J. Lü², P. Broughton³ - Show less +4 more•Institutions (3)

Heriot-Watt University¹, London South Bank University², International Computers Limited³

01 Sep 1999-Concurrency and Computation: Practice and Experience

TL;DR: An analytical tool is described which determines the performance characteristics of relational database transactions executing on particular machine configurations and provides simple graphical visualisations of these to enable users to obtain rapid insight into particular scenarios.

...read moreread less

Abstract: The uptake of parallel DBMSs is being hampered by uncertainty about the impact on performance of porting database applications from sequential to parallel systems. The development of tools which aid the system manager or machine vendor could help to reduce this problem. This paper describes an analytical tool which determines the performance characteristics (in terms of throughput, resource utilisation and response time) of relational database transactions executing on particular machine configurations and provides simple graphical visualisations of these to enable users to obtain rapid insight into particular scenarios. The problems of handling different parallel DBMSs are illustrated with reference to three systems – Ingres, Informix and Oracle. A brief description is also given of two different approaches used to confirm the validity of the analytical approach on which the tool is based. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Parallel sensitivity analysis for DAEs with many parameters

[...]

Wenjie Zhu¹, Linda R. Petzold²•Institutions (2)

University of Minnesota¹, University of California, Santa Barbara²

25 Aug 1999-Concurrency and Computation: Practice and Experience

TL;DR: Several parallel implementations based on DASSLSO are explored and their performance when using the Message Passing Interface (MPI) on an SGI Origin 2000 is compared.

...read moreread less

Abstract: In this paper, we discuss the parallel computation of the sensitivity analysis of systems of differential-algebraic equations (DAEs) with a moderate number of state variables and a large number of sensitivity parameters. Several parallel implementations based on DASSLSO are explored and their performance when using the Message Passing Interface (MPI) on an SGI Origin 2000 is compared. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

An adaptive load distribution algorithm for resolving bursty workload

[...]

Qin Lu¹, Sau Ming Lau²•Institutions (2)

Hong Kong Polytechnic University¹, The Chinese University of Hong Kong²

01 Jan 1999-Concurrency and Computation: Practice and Experience

Journal Article•DOI•

A load balanced distributed computing system

[...]

Muslim Bozyigit¹, Jarallah AlGhamdi¹, M. Ghouseuddin¹, Hassan R. Barada¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

01 Oct 1999-Concurrency and Computation: Practice and Experience

Journal Article•DOI•

The SDSC encryption/authentication (SEA) system

[...]

Wayne Schroeder¹•Institutions (1)

University of California, San Diego¹

25 Dec 1999-Concurrency and Computation: Practice and Experience

TL;DR: The design and capabilities of the SDSC Encryption and Authentication system are presented and future plans for enhancing this system are discussed.

...read moreread less

Abstract: As part of the Distributed Object Computation Testbed project (DOCT) and the Data Intensive Computing initiative of the National Partnership for Advanced Computational Infrastructure (NPACI), the San Diego Supercomputer Center has designed and implemented a multi-platform encryption and authentication system referred to as the SDSC Encryption and Authentication, or SEA, system. The SEA system is based on RSA and RC5 encryption capabilities and is designed for use in an HPC/WAN environment containing diverse hardware architectures and operating systems (including Cray T90, Cray T3E, Cray J90, SunOS, Solaris, AIX, SGI, HP, NextStep, and Linux). The system includes the SEA library, which provides reliable, efficient, and flexible authentication and encryption capabilities between two processes communicating via TCP/IP sockets, and SEA utilities/daemons, which provide a simple key management system. It is currently in use by the SDSC Storage Resource Broker (SRB), as well as by user interface utilities to SDSC's installation of the High Performance Storage System (HPSS). This paper presents the design and capabilities of the SEA system and discusses future plans for enhancing this system. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Reproducible execution of SR programs

[...]

Ronald A. Olsson¹•Institutions (1)

University of California, Davis¹

10 Aug 1999-Concurrency and Computation: Practice and Experience

Journal Article•DOI•

Simulated annealing task to processor mapping for domain decomposition methods on distributed parallel computers

[...]

Christopher C. Pain¹, C.R.E. de Oliveira¹, Anthony J. H. Goddard¹•Institutions (1)

Imperial College London¹

01 Mar 1999-Concurrency and Computation: Practice and Experience

Journal Article•DOI•

Distributed Seismic Unix: a tool for seismic data processing

[...]

Alejandro E. Murillo¹, Jean Luc Bell¹•Institutions (1)

Colorado School of Mines¹

10 Apr 1999-Concurrency and Computation: Practice and Experience

TL;DR: DSU is designed to assist geophysicists in developing and executing sequences of Seismic Unix (SU) applications in clusters of workstations as well as on tightly coupled multiprocessor machines.

...read moreread less

Abstract: This paper describes a distributed system called Distributed Seismic Unix (DSU) DSU provides tools for creating and executing application sequences over several types of multiprocessor environments DSU is designed to assist geophysicists in developing and executing sequences of Seismic Unix (SU) applications in clusters of workstations as well as on tightly coupled multiprocessor machines SU is a large collection of subroutine libraries, graphics tools and fundamental seismic data processing applications that is freely available via the Internet from the Center for Wave Phenomena (CWP) of the Colorado School of Mines SU is currently used at more than 500 sites in 32 countries around the world DSU is built on top of three publicly available software packages: SU itself; TCL/TK, which provides the necessary tools to build the graphical user interface (GUI); and PVM (Parallel Virtual Machine), which supports process management and communication DSU handles tree-like graphs representing sequences of SU applications Nodes of a graph represent SU applications, while the arcs represent the way the data flow from the root node to the lead nodes of the tree In general the root node corresponds to an application that reads or creates synthetic seismic data, and the leaf nodes are associated with applications that write or display the processed seismic data; intermediate nodes are usually associated with typical seismic processing applications like filters, convolutions and signal processing Pipelining parallelism is obtained when executing single-branch tree sequences, while a higher degree of parallelism is obtained when executing sequences with several branches A major advantage of the DSU framework for distribution is that SU applications do not need to be modified for parallelism; only a few low-level system functions need to be modified Copyright © 1999 John Wiley & Sons, Ltd

...read moreread less

Journal Article•DOI•

A software design technique for client-server applications

[...]

Jahangir Karimi¹•Institutions (1)

University of Colorado Denver¹

01 Jan 1999-Concurrency and Computation: Practice and Experience

TL;DR: A comparison of CSDT with other existing approaches and the lessons learned from the experience with this technique are compared, to highlight the benefits of this technique.

...read moreread less

Abstract: Software design is the process of mapping software functional requirements into a set of modules for implementation. In this paper, a new design technique called the concurrent software design technique (CSDT) is proposed. CSDT extends software design techniques, which are based on structured analysis and design, by identifying independent concurrent tasks for implementation in multiprocessing, multitasking and the C/S environment. A case study on re-engineering a large legacy system, implemented on mainframes as a sequential system, to a C/S environment is presented next in order to highlight the benefits of the CSDT. Finally, this paper concludes with a comparison of CSDT with other existing approaches and the lessons learned from the experience with this technique. © 1999 John Wiley & Sons, Ltd.

...read moreread less