scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Parallel and Distributed Computing in 1991"


Journal ArticleDOI
TL;DR: The results show that for applications with regular data access patterns—the authors evaluate a particle-based simulator used in aeronautics and an LU-decomposition application—prefetching can be very effective, and the performance of a distributed-time logic simulation application that made extensive use of pointers and linked lists could be increased by only 30%.

318 citations


Journal ArticleDOI
TL;DR: It is shown that there are reconfigurable machines based on simple network topologies that are capable of solving large classes of problems in constant time, depending on the kinds of switches assumed for the network nodes.

175 citations


Journal ArticleDOI
TL;DR: The alignment technique presented here focuses on minimizing the data movement between processors due to cross-references between multiple distributed arrays, and simplifies the task of data partition and communication generation in the context of a parallelizing compiler for distributed-memory machines.

153 citations


Journal ArticleDOI
TL;DR: The algorithms proposed for these basic communication problems in a hypercube network of processors are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions.

146 citations


Journal ArticleDOI
TL;DR: A two-dimensional buddy system (2DBS) is proposed as a partitioning scheme for dynamic resource allocation in a PMCS and internal fragmentation of the proposed 2DBS under various probability distributions of job sizes and processing times is analyzed.

138 citations


Journal ArticleDOI
TL;DR: It is shown that this formulation of the Markov chain matrix can be expressed in terms of generalized tensor product, using the modularity of the SAN models, and allows the matrix to be stored with considerable memory savings.

135 citations


Journal ArticleDOI
TL;DR: It is found that the twisted cube delivers an improvement in performance over the hypercube, but not nearly as much as the reduction in diameter.

133 citations


Journal ArticleDOI
TL;DR: Two synchronous multiprocessor architectures based on pipelined optical bus interconnections based on a two-dimensional architecture and a linear pipeline with enhanced control strategies are presented, which appear to be good candidates for a new generation of hybrid optical-electronic parallel computers.

114 citations


Journal ArticleDOI
TL;DR: Experimentation aimed at determining the potential benefit of mixed-mode SIMD/MIMD parallel architectures is reported, based on timing measurements made on the PASM system prototype at Purdue utilizing carefully coded synthetic variations of a well-known algorithm.

105 citations


Journal ArticleDOI
TL;DR: The syntax and semantics of the DINO language is described, examples of DINO programs are given, a critique of theDINO language features are presented, and the performance of code generated by the Dino compiler is discussed.

101 citations


Journal ArticleDOI
TL;DR: This paper evaluates the performance of the family of multidimensional mesh topologies (which includes the hypercube) under the constant pin-out constraint and shows that higher dimensionality is more important than wider channel width under this constraint.

Journal ArticleDOI
TL;DR: A general theory for modeling and designing fault-tolerant multiprocessor systems in a systematic and efficient manner is presented and the resulting designs are shown to be far superior to those proposed in previous work.

Journal ArticleDOI
TL;DR: This paper describes the implementation of a testbed for load balancing techniques, used for different static and dynamic strategies for balancing the work load of an iPSC/2 Implementation of a simple simulation of population evolution.

Journal ArticleDOI
TL;DR: In this article, the authors use the isoefficiency metric to analyze the scalability of parallel algorithms for finding shortest paths between all pairs of nodes in a densely connected graph, and find the classic trade-offs of hardware cost vs scalability and memory vs time to be represented here as tradeoffs of HPCs vs. scalability.

Journal ArticleDOI
TL;DR: This paper shows how imperative language programs can be translated into dataflow graphs and executed on a dataflow machine like Monsoon, and suggests that data flow graphs can serve as an executable intermediate representation in parallelizing compilers.

Journal ArticleDOI
TL;DR: A variant of A* search designed to run on the massively parallel, SIMD Connection Machine (CM-2), called PRA* (for Parallel Retraction A*), is designed to maximize use of the Connection Machine′s memory and processors.

Journal ArticleDOI
TL;DR: The chare kernel is a collection of primitive functions that manage chares, manipulate messages, invoke atomic computations, and coordinate concurrent activities that supports parallel computations with irregular structure.

Journal ArticleDOI
TL;DR: A new parallel heuristic is described that on the 32K-processor CM-2 Connection Machine handles graphs with more than two million edges and gives in 9-min partitions that are within 2% of the best ever found.

Journal ArticleDOI
TL;DR: This paper proposes a method called the vertically layered allocation scheme which utilizes heuristic rules in finding a compromise between computation and communication costs in a static data flow environment.

Journal ArticleDOI
TL;DR: An optimal algorithm for performing the communication described by exchanging the bits of the node address with that of the local address is described, typically in both matrix transposition and bit reversal for the fast Fourier transform.

Journal ArticleDOI
TL;DR: This work presents a formal solution to the problem of guaranteeing serializable behavior in synchronous parallel production systems that execute many rules simultaneously, and presents a variety of algorithms that implement this solution.

Journal ArticleDOI
TL;DR: Methods for embedding one-, two-, and three-dimensional mesh of trees in the hypercube are described, which have significant practical importance in enhancing the capabilities of thehypercube.

Journal ArticleDOI
TL;DR: It is observed that for nonuniform images uniform partitioning does not perform well, whereas static and dynamic partitioning strategies perform well and comparably in most cases.

Journal ArticleDOI
TL;DR: An efficient multiprocessor algorithm to merge m, m ⩾ 2, sorted lists containing N elements is described, which substantially reduces the data access costs in comparison with traditional schemes that successively merge the lists two at a time.

Journal ArticleDOI
TL;DR: Under certain workload assumptions, results show that placement algorithms that are strongly biased toward local frame allocation but are able to borrow remote frames can reduce the number of page faults over strictly local allocation.

Journal ArticleDOI
TL;DR: The design of a benchmark is presented, SLALOM{trademark}, that scales automatically to the computing power available, and corrects several deficiencies in various existing benchmarks: it is highly scalable, it solves a real problem, it includes input and output times, and it can be run on parallel machines of all kinds, using any convenient language.

Journal ArticleDOI
TL;DR: This work gives an efficient algorithm to find the minimum-cost way to evaluate an expression, for several different data parallel architectures, and applies to any architecture in which the metric describing the cost of moving an array has a property the authors call “robustness".

Journal ArticleDOI
TL;DR: This work examines design alternatives for ordered radix-2 DIF (decimation-in-frequency) FFT algorithms on massively parallel hypercube multiprocessors such as the Connection Machine and combines the order and computational phases of the FFT and also uses sequence to processor maps that reduce communication.

Journal ArticleDOI
TL;DR: It is argued that multiprocessors based on a fast global control unit capable of fast execution of serial code, and capable of managing an ensemble of slower processors, offer a performance/ cost ratio significantly better than any comparable homogeneous multipROcessor with distributed control.

Journal ArticleDOI
TL;DR: The proposed processor-efficient parallel algorithm for the 0/1 knapsack problem has optimal time speedup and processor efficiency over the best known sequential algorithm and performs very well for a wide range of input sizes.