Showing papers on "SPMD published in 2003"

PDF

Open Access

Proceedings Article•DOI•

Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

[...]

07 Jun 2003

TL;DR: Experimental results demonstrate that OpenMP provides competitive performance compared to MPI for a large set of experimental conditions, however the price of this performance is a strong programming effort on data set adaptation and inter-thread communications.

...read moreread less

Abstract: When using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a choice of a broad range of programming approaches.To help the programmer in his selection, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B) and two shared memory multiprocessors (IBM SP3 Night Hawk II, SGI Origin 3800). We also present a path from MPI to OpenMP SPMD guiding the programmers starting from an existing MPI code. We present the first SPMD OpenMP version of the NAS benchmark and compare it with other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared to MPI for a large set of experimental conditions. However the price of this performance is a strong programming effort on data set adaptation and inter-thread communications. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences.

...read moreread less

74 citations

Journal Article•DOI•

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

[...]

Rolf Rabenseifner¹, Gerhard Wellein•Institutions (1)

University of Stuttgart¹

01 Feb 2003

TL;DR: The hybrid MPI+OpenMP programming model is compared with pure MPI, compiler based parallelization, and other parallel programming models on hybrid architectures, and also on whether programming paradigms can separate the optimization of communication and computation.

...read moreread less

Abstract: Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node interconnect with the shared memory parallelization inside each node. The hybrid MPI+OpenMP programming model is compared with pure MPI, compiler based parallelization, and other parallel programming models on hybrid architectures. The paper focuses on bandwidth and latency aspects, and also on whether programming paradigms can separate the optimization of communication and computation. Benchmark results are presented for hybrid and pure MPI communication. This paper analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes.

...read moreread less

60 citations

Book Chapter•DOI•

Co-array Fortran Performance and Potential: An NPB Experimental Study

[...]

Cristian Coarfa¹, Yuri Dotsenko¹, Jason Eckhardt¹, John Mellor-Crummey¹•Institutions (1)

Rice University¹

02 Oct 2003

TL;DR: Preliminary experiments comparing CAF and MPI versions of several of the NAS parallel benchmarks on an Itanium 2 cluster with a Myrinet 2000 interconnect show that the CAF compiler delivers performance that is roughly equal to or, in many cases, better than that of programs parallelized using MPI, even though support for global optimization of communication has not yet been implemented in the compiler.

...read moreread less

Abstract: Co-array Fortran (CAF) is an emerging model for scalable, global address space parallel programming that consists of a small set of extensions to the Fortran 90 programming language. Compared to MPI, the widely-used message-passing programming model, CAF’s global address space programming model simplifies the development of single-program-multiple-data parallel programs by shifting the burden for choreographing and optimizing communication from developers to compilers. This paper describes an open-source, portable, and retargetable CAF compiler under development at Rice University that is well-suited for today’s high-performance clusters. Our compiler translates CAF into Fortran 90 plus calls to one-sided communication primitives. Preliminary experiments comparing CAF and MPI versions of several of the NAS parallel benchmarks on an Itanium 2 cluster with a Myrinet 2000 interconnect show that our CAF compiler delivers performance that is roughly equal to or, in many cases, better than that of programs parallelized using MPI, even though support for global optimization of communication has not yet been implemented in our compiler.

...read moreread less

54 citations

Patent•

System and method for efficiently executing single program multiple data (SPMD) programs

[...]

Stefano Cervini¹•Institutions (1)

STMicroelectronics¹

14 Nov 2003

TL;DR: In this paper, a system and method for efficiently executing single program multiple data (SPMD) programs in a microprocessor is described, where a micro SIMD unit is located within the microprocessor.

...read moreread less

Abstract: A system and method is disclosed for efficiently executing single program multiple data (SPMD) programs in a microprocessor. A micro single instruction multiple data (SIMD) unit is located within the microprocessor. A job buffer that is coupled to the micro SIMD unit dynamically allocates tasks to the micro SIMD unit. The SPMD programs each comprise a plurality of input data streams having moderate diversification of control flows. The system executes each SPMD program once for each input data stream of the plurality of input data streams.

...read moreread less

47 citations

Journal Article•DOI•

The load unbalancing problem for region growing image segmentation algorithms

[...]

María Dolores Gil Montoya¹, Consolación Gil¹, Inmaculada García¹•Institutions (1)

University of Almería¹

01 Apr 2003-Journal of Parallel and Distributed Computing

TL;DR: This paper discusses and evaluates parallel implementations of a segmentation algorithm based on the Split-and-Merge approach and proposes and analyzes several strategies for the selection of region identifiers and their influence on execution time and load distribution.

...read moreread less

31 citations

Journal Article•DOI•

High performance air pollution modeling for a power plant environment

[...]

María Martín¹, David E. Singh², J. Carlos Mouriño¹, Francisco F. Rivera², Ramón Doallo¹, Javier D. Bruguera² - Show less +2 more•Institutions (2)

University of A Coruña¹, University of Santiago de Compostela²

01 Nov 2003

TL;DR: This work optimize the sequential program of the STEM-II (Sulphur Transport Eulerian Model 2) program, a large-scale pollution modeling application, with the aim of increasing data locality, and parallelized the program using OpenMP directives for shared memory systems, and the MPI library for distributed memory machines.

...read moreread less

Abstract: The aim of this work is to provide a high performance air quality simulation using the STEM-II (Sulphur Transport Eulerian Model 2) program, a large-scale pollution modeling application First, we optimize the sequential program with the aim of increasing data locality Then, we parallelized the program using OpenMP directives for shared memory systems, and the MPI library for distributed memory machines Performance results are presented for a SGI O2000 multiproccessor, a Fujitsu AP3000 multicomputer and a Cluster of PCs Experimental results show that the parallel versions of the code achieve important reductions in the CPU time needed by each simulation This will allow us to obtain results with adequate speed and reliability for the industrial environment where it is intended to be applied

...read moreread less

30 citations

Proceedings Article•DOI•

Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system

[...]

Y. Ojima¹, Mitsuhisa Sato¹, H. Harada², Yutaka Ishikawa³•Institutions (3)

University of Tsukuba¹, Hewlett-Packard², University of Tokyo³

12 May 2003

TL;DR: Implementation of a "cluster-enabled" OpenMP compiler is presented, which converts programs written for OpenMP into parallel programs using the SCASH static library, moving all shared global variables into SCASH shared address space at runtime.

...read moreread less

Abstract: OpenMP has attracted widespread interest because it is an easy-to-use parallel programming model for shared memory multiprocessor systems. Implementation of a "cluster-enabled" OpenMP compiler is presented. Compiled programs are linked to the page-based software distributed-shared-memory system, SCASH, which runs on PC clusters. This allows OpenMP programs to be run transparently in a distributed memory environment. The compiler converts programs written for OpenMP into parallel programs using the SCASH static library, moving all shared global variables into SCASH shared address space at runtime. As data mapping has a great impact on the performance of OpenMP programs compiled for software distributed-shared-memory, extensions to OpenMP directives are defined for specifying data mapping and loop scheduling behavior, allowing data to be allocated to the node where it is to be processed. Experimental results of benchmark programs on PC clusters using both Myrinet and fast Ethernet are reported.

...read moreread less

24 citations

Journal Article•DOI•

Achieving portable and efficient parallel CORBA objects

[...]

Alexandre Denis, Christian Pérez¹, Thierry Priol¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

25 Aug 2003-Concurrency and Computation: Practice and Experience

TL;DR: This work aims at bringing single program multiple data (SPMD) programming into CORBA in a portable way, and shows that portable parallel CORBA objects can efficiently make use of high‐performance networks.

...read moreread less

Abstract: With the availability of Computational Grids, new kinds of applications are emerging. They raise the problem of how to program them on such computing systems. In this paper, we advocate a programming model based on a combination of parallel and distributed programming models. Compared to previous approaches, this work aims at bringing SPMD programming into CORBA in a portable way. For example, we want to interconnect two parallel codes by CORBA without modifying either CORBA or the parallel communication API. We show that such an approach does not entail any loss of performance compared to previous approaches that required modification to the CORBA standard. Moreover, using an ORB that is able to exploit high performance networks, we show that portable parallel CORBA objects can efficiently make use of such networks.

...read moreread less

22 citations

Book Chapter•DOI•

Improving the performance of OpenMP by array privatization

[...]

Zhenying Liu¹, Barbara Chapman¹, Tien-Hsiung Weng¹, Oscar Hernandez¹•Institutions (1)

University of Houston¹

26 Jun 2003

TL;DR: A tool to relieve users from this task by automatically converting OpenMP programs into equivalent SPMD style OpenMP by considering how to modify array declarations, parallel loops, and showing how to handle a variety of OpenMP constructs including REDUCTION, ORDERED clauses and synchronization.

...read moreread less

Abstract: The scalability of an OpenMP program in a ccNUMA system with a large number of processors suffers from remote memory accesses, cache misses and false sharing. Good data locality is needed to overcome these problems whereas OpenMP offers limited capabilities to control it on ccNUMA architecture. A so-called SPMD style OpenMP program can achieve data locality by means of array privatization, and this approach has shown good performance in previous research. It is hard to write SPMD OpenMP code; therefore we are building a tool to relieve users from this task by automatically converting OpenMP programs into equivalent SPMD style OpenMP. We show the process of the translation by considering how to modify array declarations, parallel loops, and showing how to handle a variety of OpenMP constructs including REDUCTION, ORDERED clauses and synchronization. We are currently implementing these translations in an interactive tool based on the Open64 compiler.

...read moreread less

19 citations

Proceedings Article•DOI•

A performance comparison of DSM, PVM, and MPI

[...]

P. Werstein¹, M. Pethick¹, Zhiyi Huang¹•Institutions (1)

University of Otago¹

20 Oct 2003

TL;DR: The results show DSM has similar performance to message passing for the embarrassingly parallel class, however the performance of DSM is lower than PVM and MPI for the synchronous and loosely synchronous classes of problems.

...read moreread less

Abstract: We compare the performance of the Treadmarks DSM system with two popular message passing systems (PVM and MPI). The comparison is done on 1, 2, 4, 8, 16, 24, and 32 nodes. Applications are chosen to represent three classes of problems: loosely synchronous, embarrassingly parallel, and synchronous. The results show DSM has similar performance to message passing for the embarrassingly parallel class. However the performance of DSM is lower than PVM and MPI for the synchronous and loosely synchronous classes of problems. An analysis of the reasons is presented.

...read moreread less

19 citations

Proceedings Article•DOI•

Implementation of page management in Mome, a user-level DSM

[...]

Y. Jegou¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

12 May 2003

TL;DR: This paper describes the implementation of the page management in Mome, a user-level distributed shared memory (DSM) that provides a shared segment space to parallel programs running on distributed memory computers or clusters.

...read moreread less

Abstract: This paper describes the implementation of the page management in Mome, a user-level distributed shared memory (DSM). Mome provides a shared segment space to parallel programs running on distributed memory computers or clusters. Individual processes can request for mappings between their local address space and Mome segments. The DSM handles the consistency of mapped memory regions at the page-level. A node can freely select the consistency model which is applied to its own view of a page among two models: the classical strong consistency model and a simple and very basic weak model. Under the weak model, each process of the parallel application must send a consistency request to the DSM each time its view of the shared data needs to integrate modifications from other nodes. Mome targets the execution of programs from the high performance community using an SPMD computation model and the coupling of these simulation codes using an MIMD model.

...read moreread less

Journal Article•DOI•

Developing SPMD applications with load balancing

[...]

Alexandre Plastino¹, Celso C. Ribeiro², Noemi Rodriguez²•Institutions (2)

Federal Fluminense University¹, The Catholic University of America²

01 Jun 2003

TL;DR: Numerical results and elapsed times measurements show the importance of using an appropriate load balancing algorithm and the associated reductions that can be achieved in the elapsed times and illustrate that the most suitable load balancing strategy may vary with the type of application and with the number of available processors.

...read moreread less

Abstract: The central contribution of this work is SAMBA (Single Application, Multiple Load Balancing), a framework for the development of parallel SPMD (single program, multiple data) applications with load balancing. This framework models the structure and the characteristics common to different SPMD applications and supports their development. SAMBA also contains a library of load balancing algorithms. This environment allows the developer to focus on the specific problem at hand. Special emphasis is given to the identification of appropriate load balancing strategies for each application. Three different case studies were used to validate the functionality of the framework: matrix multiplication, numerical integration, and a genetic algorithm. These applications illustrate its ease of use and the relevance of load balancing. Their choice was oriented by the different load imbalance factors they present and by their different task creation mechanisms. The computational experiments reported for these case studies made possible the validation of SAMBA and the comparison, without additional reprogramming costs, of different load balancing strategies for each of them. The numerical results and the elapsed times measurements show the importance of using an appropriate load balancing algorithm and the associated reductions that can be achieved in the elapsed times. They also illustrate that the most suitable load balancing strategy may vary with the type of application and with the number of available processors. Besides the support to the development of SPMD applications, the facilities offered by SAMBA in terms of load balancing play also an important role in terms of the development of efficient parallel implementations.

...read moreread less

Book Chapter•DOI•

Analyses for the translation of OpenMP codes into SPMD style with array privatization

[...]

Zhenying Liu¹, Barbara Chapman¹, Yi Wen¹, Lei Huang¹, Tien-Hsiung Weng¹, Oscar Hernandez¹ - Show less +2 more•Institutions (1)

University of Houston¹

26 Jun 2003

TL;DR: How to interprocedurally detect whether the OpenMP program consistently schedules the parallel loops and where the strategy used to translate them differs from the straightforward approach that can otherwise be applied is explained.

...read moreread less

Abstract: A so-called SPMD style OpenMP program can achieve scalability on ccNUMA systems by means of array privatization, and earlier research has shown good performance under this approach. Since it is hard to write SPMD OpenMP code, we showed a strategy for the automatic translation of many OpenMP constructs into SPMD style in our previous work. In this paper, we first explain how to interprocedurally detect whether the OpenMP program consistently schedules the parallel loops. If the parallel loops are consistently scheduled, we may carry out array privatization according to OpenMP semantics. We give two examples of code patterns that can be handled despite the fact that they are not consistent, and where the strategy used to translate them differs from the straightforward approach that can otherwise be applied.

...read moreread less

Book Chapter•DOI•

Polynomial-Time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays

[...]

Wei-Yu Chen¹, Arvind Krishnamurthy², Katherine Yelick¹•Institutions (2)

University of California¹, Yale University²

02 Oct 2003

TL;DR: In this article, the authors present new algorithms to enforce sequential consistency for the special case of the Single Program Multiple Data (SPMD) model of parallelism, and present three polynomial-time methods that more accurately support programs with array accesses.

...read moreread less

Abstract: The simplest semantics for parallel shared memory programs is sequential consistency in which memory operations appear to take place in the order specified by the program. But many compiler optimizations and hardware features explicitly reorder memory operations or make use of overlapping memory operations which may violate this constraint. To ensure sequential consistency while allowing for these optimizations, traditional data dependence analysis is augmented with a parallel analysis called cycle detection. In this paper, we present new algorithms to enforce sequential consistency for the special case of the Single Program Multiple Data (SPMD) model of parallelism. First, we present an algorithm for the basic cycle detection problem, which lowers the running time from O(n 3) to O(n 2). Next, we present three polynomial-time methods that more accurately support programs with array accesses. These results are a step toward making sequentially consistent shared memory programming a practical model across a wide range of languages and hardware platforms.

...read moreread less

Proceedings Article•DOI•

Parallelization of cellular neural networks for image processing on cluster architectures

[...]

Thomas Weishäupl¹, Erich Schikuta¹•Institutions (1)

University of Vienna¹

27 Oct 2003

TL;DR: A simple but effective approach for parallelization of cellular neural networks for image processing is developed using the SPMD model and is based on the structural data parallel approach.

...read moreread less

Abstract: In this paper a simple but effective approach for parallelization of cellular neural networks for image processing is developed. Digital gray-scale images were used to evaluate the program. The approach uses the SPMD (single-program multiple-data) model and is based on the structural data parallel approach (Schikuta et al, 1996). The process of parallelizing the algorithm employs HPF to generate an MPI-based program and the performance behavior was analyzed on two different cluster architectures.

...read moreread less

Journal Article•DOI•

Optimization and Performance of a Fortran 90 MPI-Based Unstructured Code on Large-Scale Parallel Systems

[...]

Dale R. Shires¹, Ram Mohan²•Institutions (2)

United States Army Research Laboratory¹, University of New Orleans²

01 Jun 2003-The Journal of Supercomputing

TL;DR: This work discusses optimization strategies used and their degree of success to increase performance of an MPI-based unstructured finite element simulation code written in Fortran 90 and discusses performance results based on implementations using several modern massively parallel computing platforms.

...read moreread less

Abstract: The message-passing interface (MPI) has become the standard in achieving effective results when using the message passing paradigm of parallelization. Codes written using MPI are extremely portable and are applicable to both clusters and massively parallel computing platforms. Since MPI uses the single program, multiple data (SPMD) approach to parallelism, good performance requires careful tuning of the serial code as well as careful data and control flow analysis to limit communication. We discuss optimization strategies used and their degree of success to increase performance of an MPI-based unstructured finite element simulation code written in Fortran 90. We discuss performance results based on implementations using several modern massively parallel computing platforms including the SGI Origin 3800, IBM Nighthawk 2 SMP, and Cray T3E-1200.

...read moreread less

Proceedings Article•DOI•

SPMD image processing on Beowulf clusters: directives and libraries

[...]

P.M. Oliveira, H. du Buf

22 Apr 2003

TL;DR: A small SPMD library (SPMDlib) is developed on top of MPI that contains much less but generic routines that can be optimized for different network topologies and Extensions for Fortran 90/95 and C are discussed.

...read moreread less

Abstract: Most image processing algorithms can be parallelized by splitting parallel loops and by using very few communication patterns. Code parallelization using MPI still involves much programming overhead. In order to reduce these overheads, we first developed a small SPMD library (SPMDlib) on top of MPI. The programmer can use the library routines themselves, because they are easy to learn and to apply, even without knowing MPI. However, in order to increase user friendliness, we also develop a small set of parallelization and communication directives/pragmas (SPMDdir), together with a parser that converts these into library calls. SPMDdir is used to develop a new version of SPMDlib. This new version contains much less but generic routines that can be optimized for different network topologies. Extensions for Fortran 90/95 and C are discussed, as well as communication optimizations.

...read moreread less

Journal Article•DOI•

Optimization of a kinetic laser-plasma interaction code for large parallel systems

[...]

Olivier Coulaud¹, Michaël Dussere¹, Pascal Hénon¹, Erik Lefebvre, Jean Roman¹ - Show less +1 more•Institutions (1)

University of Bordeaux¹

01 Sep 2003

TL;DR: Some optimizations to achieve large simulations, such as communication overlapping, cache-friendly data management, and the use of a parallel sparse PCG solves the Poisson's equation are presented.

...read moreread less

Abstract: In this work, we simulate the interaction between intense laser radiation and a fully ionized plasma by solving a Vlasov-Maxwell system using the "Particle-In-Cell" (PIC) method. This method provides a very detailed description of the plasma dynamics, but at the expense of large computer resources.Our SPMD 3D PIC code, CALDER, which is fully relativistic, is based on a spatial domain decomposition. Each processor is assigned one subdomain and is in charge of updating its field values and particle coordinates. This paper presents some optimizations to achieve large simulations, such as communication overlapping, cache-friendly data management, and the use of a parallel sparse PCG solves the Poisson's equation.Finally we present the benefits from these optimizations on the IBM SP3 and the physical results for a large case simulation obtained on the CEA/DIF Teraflops parallel computer.

...read moreread less

Book Chapter•DOI•

OpenMP runtime support for clusters of multiprocessors

[...]

Panagiotis E. Hadjidoukas¹, Eleftherios D. Polychronopoulos¹, Theodore S. Papatheodorou¹•Institutions (1)

University of Patras¹

26 Jun 2003

TL;DR: A prototype runtime system, providing support at the backend of the NANOS OpenMP compiler, that enables the execution of unmodified OpenMP Fortran programs on both SMPs and clusters of multiprocessors, either through the hybrid programming model (MPI+OpenMP) or directly on top of Software Distributed Shared Memory (SDSM).

...read moreread less

Abstract: This paper presents a prototype runtime system, providing support at the backend of the NANOS OpenMP compiler, that enables the execution of unmodified OpenMP Fortran programs on both SMPs and clusters of multiprocessors, either through the hybrid programming model (MPI+OpenMP) or directly on top of Software Distributed Shared Memory (SDSM). The latter is feasible by adopting a share-everything approach for the generated by the OpenMP compiler code, which corresponds to the "default shared" philosophy of OpenMP. Specifically, the user-level thread stacks and the Fortran common blocks are allocated explicitly, though transparently to the programmer, in shared memory. The management of the internal runtime system structures and of the forkjoin multilevel parallelism is based on explicit communication, exploiting however the shared-memory hardware of the available SMP nodes whenever this is possible. The modular design of the runtime system allows the integration of existing unmodified SDSM libraries, despite their design for SPMD execution.

...read moreread less

Journal Article•DOI•

Efficient solid state NMR powder simulations using SMP and MPP parallel computation

[...]

Jørgen Holm Kristensen¹, Ian Farnan¹•Institutions (1)

University of Cambridge¹

01 Apr 2003-Journal of Magnetic Resonance

TL;DR: The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations.

...read moreread less

NavP Versus SPMD : Two Views of Distributed Computation

[...]

Lei Pan, Lubomir Bic, Michael B. Dillencourt

01 Jan 2003

TL;DR: This work introduces a new view of distributed computation, called the NavP view, under which a distributed program is composed of multiple sequential self-migrating threads called DSCs, which exhibits the nice properties of algorithmic integrity and parallel program composition orthogonality.

...read moreread less

Abstract: We introduce a new view of distributed computation, called the NavP view, under which a distributed program is composed of multiple sequential self-migrating threads called DSCs In contrast with those in the conventional SPMD style, programs developed in the NavP view exhibit the nice properties of algorithmic integrity and parallel program composition orthogonality, which make them clean and easy to develop and maintain The NavP programs are also scalable We use example code and performance data to demonstrate the advantages of using the NavP view for general purpose distributed parallel programming

...read moreread less

Proceedings Article•DOI•

A case study of selected SPLASH-2 applications and the SBT debugging tool

[...]

E. Novillo, Paul Lu

22 Apr 2003

TL;DR: This work presents a simple, case study performance analysis using three programs from the SPLASH-2 suite, and quantifies the overhead incurred by the programs when they are monitored with SBT, and concludes that the cost of the instrumentation is negligible.

...read moreread less

Abstract: SBT is portable library and tool for on-line debugging and performance monitoring of shared-memory parallel programs using the single-program-multiple-data (SPMD) model of parallelism. SPMD programs often use barriers to synchronize threads of execution and to delimit the start and end of different phases of computation. Through its useful barrier constructs, dynamic performance warnings, and integration with hardware event counter libraries, SBT helps programmers localize deadlocks and performance bottlenecks in their parallel programs. To demonstrate SBT's applicability and usefulness, we present a simple, case study performance analysis using three programs from the SPLASH-2 suite. In addition, we quantify the overhead incurred by the programs when they are monitored with SBT, and conclude that the cost of the instrumentation is negligible.

...read moreread less

Book Chapter•DOI•

Applying Load Balancing in Data Parallel Applications Using DASUD

[...]

Ana Cortés¹, M. Planas¹, J. L. Millán¹, Ana Ripoll¹, Miquel A. Senar¹, Emilio Luque¹ - Show less +2 more•Institutions (1)

Autonomous University of Barcelona¹

29 Sep 2003-Lecture Notes in Computer Science

TL;DR: DASUD (Diffusion Algorithm Searching Unbalanced Domains) algorithm has been implemented in an SPMD parallel-image thinning application to balance the workload in the processors as computation proceeds and was found to be effective in reducing computation time.

...read moreread less

Abstract: DASUD (Diffusion Algorithm Searching Unbalanced Domains) algorithm has been implemented in an SPMD parallel-image thinning application to balance the workload in the processors as computation proceeds and was found to be effective in reducing computation time The average performance gain is about 40% for a test image of size 2688x1440 on a cluster of 12 PC’s in a PVM environment

...read moreread less

Proceedings Article•DOI•

Multilevel parallelization models: application to VIV

[...]

Suchuan Dong¹, D. Lucor¹, V. Symeonidis¹, J. Xu¹, George Em Karniadakis¹ - Show less +1 more•Institutions (1)

Brown University¹

09 Jun 2003

TL;DR: Because a greatly reduced number of processes are involved in the communications at each level, these multilevel parallel paradigms reduce the network latency overhead and enable the applications to scale to a large number of processors more efficiently.

...read moreread less

Abstract: Realistic simulations of flow past a flexible cylinder subject to vortex-induced vibrations require a large number of Fourier modes along the cylinder span and high resolutions in the streamwise and cross-flow directions. Parallel computations employing a single-level parallelism for this type of problems have clear performance limitations that prevent effective scaling to the large processor count on modern supercomputers. We present two multilevel parallel paradigms based on MPI/MPI and MPI/OpenMP for high-order CFD methods within the spectral element framework and compare their performance. In the MPI/MPI model, we employ MPI process groups/communicators to decompose the flow domain and MPI processes into different levels. In the MPI/OpenMP model, we employ multiple OpenMP threads to split the workload within the subdomain and take a coarse-grain approach that significantly reduces the OpenMP synchronizations. For identical configurations the MPI/MPI model is observed to be generally more efficient. However, for dynamic p-refinement the MPI/OpenMP approach is more effective. Because a greatly reduced number of processes are involved in the communications at each level, these multilevel parallel paradigms reduce the network latency overhead and enable the applications to scale to a large number of processors more efficiently.

...read moreread less

Proceedings Article•DOI•

An SPMD/SIMD parallel tokenizer for APL

[...]

Robert Bernecky

11 Jun 2003-ACM Sigapl Apl Quote Quad

TL;DR: A highly parallel (SIMD within SPMD) tokenizer for the APL language, itself written in APL, that serves the didactic purpose of demonstrating that a large amount of parallelism exists in non-numeric computation.

...read moreread less

Abstract: We describe a highly parallel (SIMD within SPMD) tokenizer for the APL language, itself written in APL. The tokenizer does not break any new ground in the world of parallel computation, but does serve the didactic purpose of demonstrating that a large amount of parallelism exists in non-numeric computation. We plan to release the APEX APL Compiler, including the tokenizer, under the GNU Public License.

...read moreread less

Book Chapter•DOI•

An Efficient Algorithm to Compute Delay Set in SPMD Programs

[...]

Manish P. Kurhekar¹, Rajkishore Barik, Umesh Kumar¹•Institutions (1)

Indian Institutes of Technology¹

17 Dec 2003

TL;DR: O(m 2) algorithm for computing analogous delay set for SPMD programs that are used in practice, which must be structured with the property that all the variables are initialized before their value is read.

...read moreread less

Abstract: We present compiler analysis for single program multiple data (SPMD) programs that communicate through shared address space. The choice of memory consistency model is sequential consistency as defined by Lamport [9]. Previous research has shown that these programs require cycle detection to perform any kind of code re-ordering either at hardware or software. So far, the best known cycle detection algorithm for SPMD programs has been given by Krishnamurthy et al [5, 6, 8]. Their algorithm computes a delay set that is composed of those memory access pairs that if re-ordered either by hardware or software may cause violation of sequential consistency. This delay set is computed in O(m 3) time where m is the number of read/write accesses. In this paper, we present O(m 2) algorithm for computing analogous delay set for SPMD programs that are used in practice. These programs must be structured with the property that all the variables are initialized before their value is read.

...read moreread less

Journal Article•

An efficient algorithm to compute delay set in SPMD programs

[...]

Manish P. Kurhekar, Rajkishore Barik, Umesh Kumar

01 Jan 2003-Lecture Notes in Computer Science

TL;DR: In this paper, the authors present a compiler analysis for single program multiple data (SPMD) programs that communicate through shared address space, where the choice of memory consistency model is sequential consistency as defined by Lamport.

...read moreread less

Abstract: We present compiler analysis for single program multiple data (SPMD) programs that communicate through shared address space. The choice of memory consistency model is sequential consistency as defined by Lamport[9]. Previous research has shown that these programs require cycle detection to perform any kind of code re-ordering either at hardware or software. So far, the best known cycle detection algorithm for SPMD programs has been given by Krishnamurthy et al[5,6,8]. Their algorithm computes a delay set that is composed of those memory access pairs that if re-ordered either by hardware or software may cause violation of sequential consistency. This delay set is computed in O(m 3 ) time where m is the number of read/write accesses. In this paper, we present O(m 2 ) algorithm for computing analogous delay set for SPMD programs that are used in practice. These programs must be structured with the property that all the variables are initialized before their value is read.

...read moreread less

Journal Article•DOI•

Efficient communication sets generation for block-cyclic distribution on distributed-memory machines

[...]

Tsung-Chuan Huang¹, Liang-Cheng Shiu¹•Institutions (1)

National Sun Yat-sen University¹

01 Mar 2003-Journal of Systems Architecture

TL;DR: The local block distance between two active elements with the same offset and destination (source) in a processor will be investigated and an algorithm for the sending phase and receive-execute phase is developed.

...read moreread less

Intel NX to PVM3.2 Message Passing Conversion Library

[...]

Arthur Trey¹, L Nelson Michael²•Institutions (2)

Computer Sciences Corporation¹, Langley Research Center²

01 Oct 2003

TL;DR: NASA Langley Research Center has developed a library that allows Intel NX message passing codes to be executed under the more popular and widely supported Parallel Virtual Machine (PVM) message passing library.

...read moreread less

Abstract: NASA Langley Research Center has developed a library that allows Intel NX message passing codes to be executed under the more popular and widely supported Parallel Virtual Machine (PVM) message passing library. PVM was developed at Oak Ridge National Labs and has become the defacto standard for message passing. This library will allow the many programs that were developed on the Intel iPSC/860 or Intel Paragon in a Single Program Multiple Data (SPMD) design to be ported to the numerous architectures that PVM (version 3.2) supports. Also, the library adds global operations capability to PVM. A familiarity with Intel NX and PVM message passing is assumed.

...read moreread less

Book Chapter•DOI•

Parallelization of an Adaptive Cartesian Mesh Flow Solver Based on the 2N-tree Data Structure

[...]

Takanobu Ogawa¹•Institutions (1)

Seikei University¹

01 Jan 2003

TL;DR: An adaptive Cartesian mesh flow solver in which a flow field is discretised with recursively subdivided rectangular meshes is parallelized and the 2N-tree data is utilised to organize anisotropically refined meshes.

...read moreread less

Abstract: An adaptive Cartesian mesh flow solver in which a flow field is discretised with recursively subdivided rectangular meshes is parallelized. The 2N-tree data is utilised to organize anisotropically refined meshes. Parallelization is based on the SPMD paradigm. The domain decomposition technique is used and a tree data structure is split so that computational load should be balanced. Parallel efficiency is examined on a PC cluster.

...read moreread less