Showing papers on "Massively parallel published in 1993"

PDF

Open Access

Proceedings Article•DOI•

LogP: towards a realistic model of parallel computation

[...]

David E. Culler¹, Richard M. Karp¹, David A. Patterson¹, Abhijit Sahay¹, Klaus Erik Schauser¹, Eunice E. Santos¹, Ramesh Subramonian¹, Thorsten von Eicken¹ - Show less +4 more•Institutions (1)

University of California, Berkeley¹

01 Jul 1993

TL;DR: A new parallel machine model, called LogP, is offered that reflects the critical technology trends underlying parallel computers and is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers.

...read moreread less

Abstract: A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. it is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.

...read moreread less

1,515 citations

Book•

Evolution and Optimum Seeking: The Sixth Generation

[...]

Hans-Paul Schwefel

01 Aug 1993

TL;DR: The most comprehensive work of its kind, Evolution and Optimum Seeking offers a state-of-the-art perspective on the field for researchers in computer-aided design, planning, control, systems analysis, computational intelligence, and artificial life.

...read moreread less

Abstract: From the Publisher: With the publication of this book, Hans-Paul Schwefel has responded to rapidly growing interest in Evolutionary Computation, a field that originated, in part, with his pioneering work in the early 1970s. Evolution and Optimum Seeking offers a systematic overview of both new and classical approaches to computer-aided optimum system design methods, including the new class of Evolutionary Algorithms and other "Parallel Problem Solving from Nature" (PPSN) methods. It presents numerical optimization methods and algorithms to computer calculations which will be particularly useful for massively parallel computers. It is the only book in the field that offers in-depth comparisons between classical direct optimization methods and the newer methods. Dr. Schwefel's method consists essentially of the adaptation of simple evolutionary rules to a computer procedure in the search for optimal parameters within a simulation model of a technical device. In addition to its historical and practical value, Evolution and Optimum Seeking will stimulate further research into PPSN and interdisciplinary thinking about multi-agent self-organization in natural and artificial environments. These developments have been accelerated by fortunate changes in the computational environment, especially with respect to new architectures. MIMD (Multiple Instructions Multiple Data) machines with many processors working in parallel on one task seem to lend themselves to inherently parallel problem solving concepts like Evolution Strategies. The most comprehensive work of its kind, Evolution and Optimum Seeking offers a state-of-the-art perspective on the field for researchers in computer-aided design, planning, control, systems analysis, computational intelligence, and artificial life. Its range and depth make it a virtual handbook for practitioners: epistemological introduction to the concepts and strategies of optimum seeking; taxonomy of optimization tasks and solution principles (material found n

...read moreread less

704 citations

Proceedings Article•DOI•

Cray T3D: a new dimension for Cray Research

[...]

R.E. Kessler¹, J.L. Schwarzmeier¹•Institutions (1)

Cray¹

01 Jan 1993

TL;DR: Cray Research's massively parallel processing (MPP) philosophy is presented, together with a brief description of the design of the Cray T3D, the first MPP designed by Cray Research, and the 3-D torus interprocessor interconnect is discussed.

...read moreread less

Abstract: The authors present Cray Research's massively parallel processing (MPP) philosophy, together with a brief description of the design of the Cray T3D, the first MPP designed by Cray Research. They give a brief overview of the important features of the Cray T3D, including the 3-D torus interprocessor interconnect. They discuss in more detail the motivation for the 3-D torus interconnect. Using a very simple capacity model of network performance, they show how three dimensions provide a solid balance between locality and scalability up to thousands of nodes. >

...read moreread less

327 citations

Proceedings Article•DOI•

Scalable performance analysis: the Pablo performance analysis environment

[...]

Daniel A. Reed¹, P.C. Roth¹, Ruth A. Aydt¹, K.A. Shields¹, L.F. Tavera¹, R.J. Noe¹, B.W. Schwartz¹ - Show less +3 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

06 Oct 1993

TL;DR: Pablo is a performance analysis environment designed to provide unobtrusive performance data capture, analysis, and presentation across a wide variety of scalable parallel systems.

...read moreread less

Abstract: Developers of application codes for massively parallel computer systems face daunting performance tuning and optimization problems that must be solved if massively parallel systems are to fulfill their promise. Recording and analyzing the dynamics of application program, system software, and hardware interactions is the key to understanding and the prerequisite to performance tuning, but this instrumentation and analysis must not unduly perturb program execution. Pablo is a performance analysis environment designed to provide unobtrusive performance data capture, analysis, and presentation across a wide variety of scalable parallel systems. Current efforts include dynamic statistical clustering to reduce the volume of data that must be captured and complete performance data immersion via head-mounted displays. >

...read moreread less

299 citations

Journal Article•DOI•

Parallel finite-element computation of 3D flows

[...]

Tayfun E. Tezduyar¹, Shahrouz Aliabadi¹, Marek Behr¹, A. Johnson¹, Sanjay Mittal¹ - Show less +1 more•Institutions (1)

University of Minnesota¹

01 Oct 1993-IEEE Computer

TL;DR: The authors describe their work on the massively parallel finite-element computation of compressible and incompressible flows with the CM-200 and CM-5 Connection Machines, which provides a capability for solving a large class of practical problems involving free surfaces, two-liquid interfaces, and fluid-structure interactions.

...read moreread less

Abstract: The authors describe their work on the massively parallel finite-element computation of compressible and incompressible flows with the CM-200 and CM-5 Connection Machines. Their computations are based on implicit methods, and their parallel implementations are based on the assumption that the mesh is unstructured. Computations for flow problems involving moving boundaries and interfaces are achieved by using the deformable-spatial-domain/stabilized-space-time method. Using special mesh update schemes, the frequency of remeshing is minimized to reduce the projection errors involved and also to make parallelizing the computations easier. This method and its implementation on massively parallel supercomputers provide a capability for solving a large class of practical problems involving free surfaces, two-liquid interfaces, and fluid-structure interactions. >

...read moreread less

262 citations

Journal Article•DOI•

High performance Fortran

[...]

David B. Loveman

01 Feb 1993-IEEE Parallel & Distributed Technology: Systems & Applications

TL;DR: The High Performance Fortran Forum (HPFF) as discussed by the authors was a coalition of computer vendors, government laboratories, and academic groups founded in 1992 to improve the performance and usability of Fortran-90 for computationally intensive applications on a wide variety of machines, including massively parallel single-instruction multiple-data (SIMD) and MIMD systems and vector processors.

...read moreread less

Abstract: Fortran-90, its basis in Fortran-77, its implications for parallel machines, and the extensions developed for it by the High Performance Fortran Forum (HPFF), a coalition of computer vendors, government laboratories, and academic groups founded in 1992 to improve the performance and usability of Fortran-90 for computationally intensive applications on a wide variety of machines, including massively parallel single-instruction multiple-data (SIMD) and multiple-instruction multiple-data (MIMD) systems and vector processors, are discussed. SIMD and MIMD systems, previous attempts to develop languages for them, the genesis of the HPFF, how the group actually worked, and the HPF programming model are described. >

...read moreread less

234 citations

Journal Article•DOI•

Visiometrics, Juxtaposition and Modeling

[...]

Norman J. Zabusky, Deborah Silver, R. B. Pelz, Vizgroup

01 Mar 1993-Physics Today

TL;DR: In this paper, the evolution of nonlinear dynamical systems such as fluids and plasmas is being investigated in three dimensions at increasingly high resolutions, and the authors expect the resolution to increase to 1000 3 by the end of the decade.

...read moreread less

Abstract: With the advent of massively parallel computers, the evolution of nonlinear dynamical systems such as fluids and plasmas is being investigated in three dimensions at increasingly high resolutions. Today a typical physical volume is represented by 100 3 grid points, and we may expect the resolution to increase to 1000 3 by the end of the decade.

...read moreread less

190 citations

Journal Article•DOI•

Cooperative shared memory: software and hardware for scalable multiprocessors

[...]

Mark D. Hill, James R. Larus, Steven K. Reinhardt, Darien Wood

01 Nov 1993-ACM Transactions on Computer Systems

TL;DR: The initial implementation of cooperative shared memory uses a simple programming model, called Check-In/Check-Out (CICO), in conjunction with even simpler hardware, called Dir1SW, that adds little complexity to message-passing hardware, but efficiently supports programs written within the CICO model.

...read moreread less

Abstract: We believe the paucity of massively parallel, shared-memory machines follows from the lack of a shared-memory programming performance model that can inform programmers of the cost of operations (so they can avoid expensive ones) and can tell hardware designers which cases are common (so they can build simple hardware to optimize them). Cooperative shared memory, our approach to shared-memory design, addresses this problem.Our initial implementation of cooperative shared memory uses a simple programming model, called Check-In/Check-Out (CICO), in conjunction with even simpler hardware, called Dir1SW. In CICO, programs bracket uses of shared data with a check_in directive terminating the expected use of the data. A cooperative prefetch directive helps hide communication latency. Dir1SW is a minimal directory protocol that adds little complexity to message-passing hardware, but efficiently supports programs written within the CICO model.

...read moreread less

172 citations

Proceedings Article•DOI•

A data distributed, parallel algorithm for ray-traced volume rendering

[...]

Kwan-Liu Ma¹, James Painter¹, Charles Hansen¹, M. Krogh¹•Institutions (1)

Langley Research Center¹

01 Nov 1993

TL;DR: This paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the Connection Machine CM-5, and networked workstations.

...read moreread less

Abstract: This paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the Connection Machine CM-5, and networked workstations. This algorithm distributes both the data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local raytracing of their subvolume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method.

...read moreread less

128 citations

Book•

General purpose parallel computing

[...]

W. F. McColl

01 Jul 1993

TL;DR: Current issues involved in the development of systems which support ne grain concurrency in a single shared address space are discussed, including algorithmic, architectural, technological, and programming issues.

...read moreread less

Abstract: A major challenge for computer science in the 1990s is to determine the extent to which general purpose parallel computing can be achieved. The goal is to deliver both scalable parallel performance and architecture independent parallel software. (Work in the 1980s having shown that either of these alone can be achieved.) Success in this endeavour would permit the long overdue separation of software considerations in parallel computing, from those of hardware. This separation would, in turn, encourage the growth of a large and diverse parallel software industry, and provide a focus for future hardware developments. In recent years a number of new routing and memory management techniques have been developed which permit the eecient implementation of a single shared address space on distributed memory architectures. We also now have a large set of eecient, practical shared memory parallel algorithms for important problems. In this paper we discuss some of the current issues involved in the development of systems which support ne grain concurrency in a single shared address space. The paper covers algorithmic, architectural, technological, and programming issues.

...read moreread less

119 citations

Patent•

Messaging facility with hardware tail pointer and software implemented head pointer message queue for distributed memory massively parallel processing system

[...]

Randal S. Passint¹, Steven M. Oberlin¹, Eric C. Fromm¹•Institutions (1)

Cray¹

13 Dec 1993

TL;DR: In this article, a messaging facility is described that enables the passing of data from one processing element to another in a globally addressable, distributed memory multiprocessor without having an explicit destination address in the target processing element's memory.

...read moreread less

Abstract: A messaging facility is described that enables the passing of packets of data from one processing element to another in a globally addressable, distributed memory multiprocessor without having an explicit destination address in the target processing element's memory. The messaging facility can be used to accomplish a remote action by defining an opcode convention that permits one processor to send a message containing opcode, address and arguments to another. The destination processor, upon receiving the message after the arrival interrupt, can decode the opcode and perform the indicated action using the argument address and data. The messaging facility provides the primitives for the construction of an interprocessor communication protocol. Operating system communication and message-passing programming models can be accomplished using the messaging facility.

...read moreread less

Proceedings Article•DOI•

ActorSpace: an open distributed programming paradigm

[...]

Gul Agha¹, Christian J. Callsen²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Aalborg University²

01 Jul 1993

TL;DR: A new programming paradigm called ActorSpace is presented, which provides powerful support for component-based construction of massively parallel and distributed applications and open interfaces to servers and pattern-directed access to software repositories.

...read moreread less

Abstract: We present a new programming paradigm called ActorSpace. ActorSpace provides a new communication model based on destination patterns. An actorSpace is a computationally passive container of actors which acts as a context for matching patterns. Patterns are matched against listed attributes of actors and actorSpaces that are visible in the actorSpace. Both visibility and attributes are dynamic. Messages may be sent to one or all members of a group defined by a pattern. The paradigm provides powerful support for component-based construction of massively parallel and distributed applications. In particular, it supports open interfaces to servers and pattern-directed access to software repositories.

...read moreread less

Journal Article•DOI•

Massively parallel tabu search for the quadratic assignment problem

[...]

Jaishankar Chakrapani¹, Jadranka Skorin-Kapov¹•Institutions (1)

Stony Brook University¹

01 May 1993-Annals of Operations Research

TL;DR: A new heuristic algorithm to perform tabu search on the Quadratic Assignment Problem (QAP) is developed and a new intensification strategy based on intermediate term memory is proposed and shown to be promising especially while solving large QAPs.

...read moreread less

Abstract: A new heuristic algorithm to perform tabu search on the Quadratic Assignment Problem (QAP) is developed. A massively parallel implementation of the algorithm on the Connection Machine CM-2 is provided. The implementation usesn2 processors, wheren is the size of the problem. The elements of the algorithm, calledPar_tabu, include dynamically changing tabu list sizes, aspiration criterion and long term memory. A new intensification strategy based on intermediate term memory is proposed and shown to be promising especially while solving large QAPs. The combination of all these elements gives a very efficient heuristic for the QAP: the best known or improved solutions are obtained in a significantly smaller number of iterations than in other comparative studies. Combined with the implementation on CM-2, this approach provides suboptimal solutions to QAPs of bigger dimensions in reasonable time.

...read moreread less

An OSF/1 UNIX for Massively Parallel Multicomputers.

[...]

Roman Zajcew, Paul Roy, David L. Black, Chris Peak, Paulo Guedes, Bradford Kemp, John LoVerso, Michael Leibensperger, Michael Barnett, Faramarz Rabii, Durriya Netterwala - Show less +7 more

01 Jan 1993

Journal Article•DOI•

Parallel finite elements on a massively parallel computer with domain decomposition

[...]

Genki Yagawa¹, Ryuji Shioya¹•Institutions (1)

University of Tokyo¹

01 Aug 1993-Computing Systems in Engineering

TL;DR: The present DDM-based parallel finite element algorithm is combined with a hierarchical model for data and processor management to have the workload balanced dynamically among the processors.

...read moreread less

Book•

Parallel Algorithms for Digital Image Processing, Computer Vision and Neural Networks

[...]

Ioannis Pitas¹•Institutions (1)

Aristotle University of Thessaloniki¹

26 Mar 1993

TL;DR: Low Level Parallel Image Processing Parallel FFT-like Transform Algorithms on Transputers Parallel Edge Detection and Related Al algorithmsms Parallel Segmentation Algorithm MIMD and SIMD Parallel Range Data Se segmentation.

...read moreread less

Abstract: Low Level Parallel Image Processing Parallel FFT-like Transform Algorithms on Transputers Parallel Edge Detection and Related Algorithms Parallel Segmentation Algorithms MIMD and SIMD Parallel Range Data Segmentation Parallel Stereo and Motion Estimation Parallel Implementations of the Backpropagation Learning Algorithm Based on Network Topology Parallel Neural Computation Based on Algebraic Partitioning Parallel Neural Computing Based on Network Duplicating PARALLEL EIKONA: A Parallel Digital Image Processing Package Parallel Architectures and Algorithms for Real Time Computer Vision Index.

...read moreread less

Journal Article•DOI•

Overview of the Vesta parallel file system

[...]

Peter F. Corbett, Sandra Johnson Baylor, Dror G. Feitelson

01 Dec 1993-ACM Sigarch Computer Architecture News

TL;DR: The Vesta parallel file system provides user-directed checkpointing of files during continuing program execution with very little processing overhead and is scalable to a very large number of I/O and compute nodes.

...read moreread less

Abstract: The Vesta parallel file system provides parallel access from compute nodes to files distributed across I/O nodes in a massively parallel computer. Vesta is intended to solve the I/O problems of massively parallel computers executing numerically intensive scientific applications. Vesta has three interesting characteristics: First, it provides a user defined parallel view of file data, and allows user defined partitioning and repartitioning of files without moving data among I/O nodes. The parallel file access semantics of Vesta directly support the operations required by parallel language I/O libraries. Second, Vesta is scalable to a very large number (many hundreds) of I/O and compute nodes and does not contain any sequential bottlenecks in the data-access path. Third, it provides user-directed checkpointing of files during continuing program execution with very little processing overhead.

...read moreread less

Proceedings Article•DOI•

A static parameter based performance prediction tool for parallel programs

[...]

Thomas Fahringer, Hans P. Zima

01 Aug 1993

TL;DR: It turns out that the predicted parameter values allow a realistic ranking of different program versions with respect to the actual runtime, as well as a strong correlation between the statically computed parameters and actual measurements.

...read moreread less

Abstract: This paper presents a Parameter based Performance Prediction Tool (PPPT) which is part of the Vienna Fortran Compilation System (VFCS), a compiler that automatically translates Fortran programs into message passing programs for massively parallel architectures.The PPPT is applied to an explicitly parallel program generated by the VFCS, which may contain synchronous as well as asynchronous communication and is attributed with parameters computed in a previous profiling run. It statically computes a set of optional parameters that characterize the behavior of the parallel program. This includes work distribution, the number of data transfers, the amount of data transferred, transfer times, network contention, and the number of cache misses. These parameters can be selectively determined for statements, loops, procedures, and the entire program; furthermore, their effect with respect to individual processors can be examined.The tool plays an important role in the VFCS by providing the system as well as the user with vital performance information about the program. In particular, it supports automatic data distribution generation and the intelligent selection of transformation strategies, based on properties of the algorithm and characteristics of the target architecture.The tool has been implemented. Experiments show a strong correlation between the statically computed parameters and actual measurements; furthermore it turns out that the predicted parameter values allow a realistic ranking of different program versions with respect to the actual runtime.

...read moreread less

Patent•

Massively parallel array cathode

[...]

Noel C. MacDonald¹•Institutions (1)

Cornell University¹

12 Jul 1993

TL;DR: A massively parallel electron beam array for controllably imaging a target includes a multiplicity of emitter cathodes, each incorporating one or more micron-sized emitter tips.

...read moreread less

Abstract: A massively parallel electron beam array for controllably imaging a target includes a multiplicity of emitter cathodes, each incorporating one or more micron-sized emitter tips. Each tip is controlled by a control electrode to produce an electron stream, and its deflection is controlled by a multielement deflection electrode to permit scanning of a corresponding target region.

...read moreread less

Proceedings Article•DOI•

Parallel access to files in the Vesta file system

[...]

Dror G. Feitelson¹, Peter F. Corbett¹, Jean-Pierre Prost¹, Sandra Johnson Baylor¹•Institutions (1)

IBM¹

01 Dec 1993

TL;DR: The Vesta interface provides a user-defined parallel view of file data, which gives users some control over the layout of data, and six parallel access modes to Vesta files are defined, which give users very versatile parallel file access.

...read moreread less

Abstract: The Vesta parallel file system is intended to solve the I/O problems of massively parallel multicomputers executing numerically intensive scientific applications. It provides parallel access from the applications to files distributed across multiple storage nodes in the multicomputer, thereby exposing an opportunity for high-bandwidth data transfer across the multicomputer's low-latency network. The Vesta interface provides a user-defined parallel view of file data, which gives users some control over the layout of data. This is useful for tailoring data layout to much common access patterns. The interface also allows user-defined partitioning and repartitioning of files without moving data among storage nodes. Libraries with higher-level interfaces that hide the layout details, while exploiting the power of parallel access, may be implemented above the basic interface. It is shown how collective I/O operations can be implemented, and six parallel access modes to Vesta files are defined. Each mode has unique characteristics in terms of how the processes share the file and how their accesses are interleaved. The combination of user-defined file partitioning and the six access modes gives users very versatile parallel file access.

...read moreread less

Patent•

Dynamic load balancing of applications

[...]

Stephen R. Wheat¹•Institutions (1)

Sandia National Laboratories¹

01 Dec 1993

TL;DR: In this paper, an application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers, is proposed, where global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing.

...read moreread less

Abstract: An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.

...read moreread less

Journal Article•DOI•

Computation of incompressible flows with implicit finite element implementations on the Connection Machine

[...]

Marek Behr¹, A. Johnson¹, J. G. Kennedy, Sanjay Mittal¹, Tayfun E. Tezduyar¹ - Show less +1 more•Institutions (1)

University of Minnesota¹

01 Jan 1993-Computer Methods in Applied Mechanics and Engineering

TL;DR: The stabilized space-time formulation for moving boundaries and interfaces, and a new stabilized velocity-pressure-stress formulation are both described, and significant aspects of the implementation of these methods on massively parallel architectures are discussed.

...read moreread less

Patent•

Virtual to logical to physical address translation for distributed memory massively parallel processing systems

[...]

Steven M. Oberlin¹, Eric C. Fromm¹, Randal S. Passint¹•Institutions (1)

Cray¹

13 Dec 1993

TL;DR: In this article, address translation means for distributed memory massively parallel processing (MPP) systems include means for defining virtual addresses for processing elements (PE's) and memory relative to a partition of PE's under program control, and physical addresses for PE's and memory corresponding to identities and locations of PE modules within computer cabinetry.

...read moreread less

Abstract: Address translation means for distributed memory massively parallel processing (MPP) systems include means for defining virtual addresses for processing elements (PE's) and memory relative to a partition of PE's under program control, means for defining logical addresses for PE's and memory within a three-dimensional interconnected network of PE's in the MPP, and physical addresses for PE's and memory corresponding to identities and locations of PE modules within computer cabinetry. As physical PE's are mapped into or out of the logical MPP, as spares are needed, logical addresses are updated. Address references generated by a PE within a partition in virtual address mode are converted to logical addresses and physical addresses for routing on the network.

...read moreread less

Patent•

Input/output system for a massively parallel, single instruction, multiple data (SIMD) computer providing for the simultaneous transfer of data between a host computer input/output system and all SIMD memory devices

[...]

Robert S. Jaffe¹, Hungwen Li¹, Margaret M. L. Kienzle¹, Ming-Cheng Sheng¹•Institutions (1)

IBM¹

22 Nov 1993

TL;DR: In this article, a two-dimensional input/output system for a massively parallel SIMD computer system providing an interface for the two-way transfer of data between a host computer and the SIMD computers is presented.

...read moreread less

Abstract: A two-dimensional input/output system for a massively parallel SIMD computer system providing an interface for the two-way transfer of data between a host computer and the SIMD computer. A plurality of buffers equal in number, and distributed with the individual processing elements of the SIMD computer are used to provide a temporary storage area which allows data in different formats to be mapped in a format suitable for transfer to the host computer or for transfer to the SIMD processing elements. The temporary storage is controlled in such a way as to transfer entire blocks of data in a single SIMD system clock cycle thereby achieving an input/output data rate of N bits/cycle for a SIMD computer consisting of N processors. The system is capable of handling irregular as well as regular data structures. The system also emphasizes a distributed approach in having the input/output system divided into N pieces and distributed to each processor to reduce the wiring complexity while maintaining the I/O rate.

...read moreread less

Proceedings Article•DOI•

RAMA: a file system for massively-parallel computers

[...]

Ethan L. Miller¹, Randy H. Katz¹•Institutions (1)

University of California, Berkeley¹

26 Apr 1993

TL;DR: The authors describe a file system design for massively parallel computers which makes very efficient use of a few disks per processor, which overcomes the traditional input/output (I/O) bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network.

...read moreread less

Abstract: The authors describe a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional input/output (I/O) bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA (Rapid Access to Massive Archive), requires little internode synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated into the file system; in fact, RAMA runs most efficiently when tertiary storage is used. >

...read moreread less

Journal Article•DOI•

Sparse matrix computations on parallel processor arrays

[...]

Andrew T. Ogielski, William Aiello

01 May 1993-SIAM Journal on Scientific Computing

TL;DR: It is proved that with high probability the algorithms produce well-balanced storage for sufficiently large matrices with bounded number of nonzeros in each row and column, but no other restrictions on structure.

...read moreread less

Abstract: This paper investigates the balancing of distributed compressed storage of large sparse matrices on a massively parallel computer. For fast computation of matrix–vector and matrix–matrix products on a rectangular processor array with efficient communications along its rows and columns it is required that the nonzero elements of each matrix row or column be distributed among the processors located within the same array row or column, respectively. Randomized packing algorithms are constructed with such properties, and it is proved that with high probability the algorithms produce well-balanced storage for sufficiently large matrices with bounded number of nonzeros in each row and column, but no other restrictions on structure. Then basic matrix–vector multiplication routines are described with fully parallel interprocessor communications and intraprocessor gather and scatter operations. Their efficiency isdemonstrated on the 16,384-processor MasPar computer.

...read moreread less

Journal Article•DOI•

A parallel shooting technique for solving dissipative ODE's

[...]

Philippe Chartier, Bernard Philippe¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Sep 1993-Computing

TL;DR: Different modifications of a class of parallel algorithms, initially designed by A. Bellen and M. Zennaro for difference equations and called “across the steps” methods, are studied for the purpose of solving initial value problems in ordinary differential equations on a massively parallel computer.

...read moreread less

Abstract: In this paper, we study different modifications of a class of parallel algorithms, initially designed by A Bellen and M Zennaro for difference equations and called “across the steps” methods by their authors, for the purpose of solving initial value problems in ordinary differential equations (ODE's) on a massively parallel computer Restriction to dissipative problems is discussed which allow these problems to be solved efficiently, as shown by the simulations

...read moreread less

Book Chapter•DOI•

The Bird-Meertens Formalism as a Parallel Model

[...]

David B. Skillicorn¹•Institutions (1)

Queen's University¹

01 Jan 1993

TL;DR: The Bird-Meertens formalism is an approach to developing and executing data-parallel programs; it encourages software development by equational transformation; it can be implemented efficiently across a wide range of architecture families; and it is equipped with a realistic cost calculus, so that trade-offs in software design can be explored before implementation.

...read moreread less

Abstract: The expense of developing and maintaining software is the major obstacle to the routine use of parallel computation. Architecture independent programming offers a way of avoiding the problem, but the requirements for a model of parallel computation that will permit it are demanding. The Bird-Meertens formalism is an approach to developing and executing data-parallel programs; it encourages software development by equational transformation; it can be implemented efficiently across a wide range of architecture families; and it can be equipped with a realistic cost calculus, so that trade-offs in software design can be explored before implementation. It makes an ideal model of parallel computation.

...read moreread less

Journal Article•DOI•

A massively parallel algorithm for nonlinear stochastic network problems

[...]

Soren S. Nielsen¹, Stavros A. Zenios²•Institutions (2)

University of Texas at Austin¹, University of Pennsylvania²

01 Apr 1993-Operations Research

TL;DR: An algorithm for solving nonlinear, two-stage stochastic problems with network recourse based on the framework of row-action methods that permits the massively parallel solution of all the scenario subproblems concurrently and achieves computing rates of 276 MFLOPS.

...read moreread less

Abstract: We develop an algorithm for solving nonlinear, two-stage stochastic problems with network recourse. The algorithm is based on the framework of row-action methods. The problem is formulated by replicating the first-stage variables and then adding nonanticipativity side constraints. A series of independent deterministic network problems are solved at each step of the algorithm, followed by an iterative step over the nonanticipativity constraints. The solution point of the iterates over the nonanticipativity constraints is obtained analytically. The row-action nature of the algorithm makes it suitable for parallel implementations. A data representation of the problem is developed that permits the massively parallel solution of all the scenario subproblems concurrently. The algorithm is implemented on a Connection Machine CM-2 with up to 32K processing elements and achieves computing rates of 276 MFLOPS. Very large problems-8,192 scenarios with a deterministic equivalent nonlinear program with 868,367 constraints and 2,474,017 variables-are solved within a few minutes. We report extensive numerical results regarding the effects of stochasticity on the efficiency of the algorithm.

...read moreread less

Proceedings Article•DOI•

Hill-climbing, simulated annealing and genetic algorithms: a comparative study and application to the mapping problem

[...]

E.-G. Talbi¹, T. Muntean¹•Institutions (1)

University of Grenoble¹

05 Jan 1993

TL;DR: Hill-climbing, simulated annealing and genetic algorithms are search techniques that can be applied to most combinatorial optimization problems and are used to solve the mapping problem, which is the optimal static allocation of communication processes on distributed memory architectures.

...read moreread less

Abstract: Hill-climbing, simulated annealing and genetic algorithms are search techniques that can be applied to most combinatorial optimization problems. The three algorithms are used to solve the mapping problem, which is the optimal static allocation of communication processes on distributed memory architectures. Each algorithm is independently evaluated and optimized according to its parameters. The parallelization of the algorithms is also considered. As an example, a massively parallel genetic algorithm is proposed for the problem, and results of its implementation on a 128-processor Supernode are given. A comparative study of the algorithms is then carried out. The criteria of performance considered are the quality of the solutions obtained and the amount of search time used for several benchmarks. A hybrid approach consisting of a combination of genetic algorithms and hill-climbing is also proposed and evaluated. >

...read moreread less

Collapse