Showing papers in "Journal of Parallel and Distributed Computing in 1991"
••
TL;DR: The results show that for applications with regular data access patterns—the authors evaluate a particle-based simulator used in aeronautics and an LU-decomposition application—prefetching can be very effective, and the performance of a distributed-time logic simulation application that made extensive use of pointers and linked lists could be increased by only 30%.
318 citations
••
TL;DR: It is shown that there are reconfigurable machines based on simple network topologies that are capable of solving large classes of problems in constant time, depending on the kinds of switches assumed for the network nodes.
175 citations
••
TL;DR: The alignment technique presented here focuses on minimizing the data movement between processors due to cross-references between multiple distributed arrays, and simplifies the task of data partition and communication generation in the context of a parallelizing compiler for distributed-memory machines.
153 citations
••
TL;DR: The algorithms proposed for these basic communication problems in a hypercube network of processors are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions.
146 citations
••
TL;DR: A two-dimensional buddy system (2DBS) is proposed as a partitioning scheme for dynamic resource allocation in a PMCS and internal fragmentation of the proposed 2DBS under various probability distributions of job sizes and processing times is analyzed.
138 citations
••
TL;DR: It is shown that this formulation of the Markov chain matrix can be expressed in terms of generalized tensor product, using the modularity of the SAN models, and allows the matrix to be stored with considerable memory savings.
135 citations
••
TL;DR: It is found that the twisted cube delivers an improvement in performance over the hypercube, but not nearly as much as the reduction in diameter.
133 citations
••
TL;DR: Two synchronous multiprocessor architectures based on pipelined optical bus interconnections based on a two-dimensional architecture and a linear pipeline with enhanced control strategies are presented, which appear to be good candidates for a new generation of hybrid optical-electronic parallel computers.
114 citations
••
TL;DR: Experimentation aimed at determining the potential benefit of mixed-mode SIMD/MIMD parallel architectures is reported, based on timing measurements made on the PASM system prototype at Purdue utilizing carefully coded synthetic variations of a well-known algorithm.
105 citations
••
TL;DR: The syntax and semantics of the DINO language is described, examples of DINO programs are given, a critique of theDINO language features are presented, and the performance of code generated by the Dino compiler is discussed.
101 citations
••
TL;DR: This paper evaluates the performance of the family of multidimensional mesh topologies (which includes the hypercube) under the constant pin-out constraint and shows that higher dimensionality is more important than wider channel width under this constraint.
••
TL;DR: A general theory for modeling and designing fault-tolerant multiprocessor systems in a systematic and efficient manner is presented and the resulting designs are shown to be far superior to those proposed in previous work.
••
TL;DR: This paper describes the implementation of a testbed for load balancing techniques, used for different static and dynamic strategies for balancing the work load of an iPSC/2 Implementation of a simple simulation of population evolution.
••
TL;DR: In this article, the authors use the isoefficiency metric to analyze the scalability of parallel algorithms for finding shortest paths between all pairs of nodes in a densely connected graph, and find the classic trade-offs of hardware cost vs scalability and memory vs time to be represented here as tradeoffs of HPCs vs. scalability.
••
TL;DR: This paper shows how imperative language programs can be translated into dataflow graphs and executed on a dataflow machine like Monsoon, and suggests that data flow graphs can serve as an executable intermediate representation in parallelizing compilers.
••
TL;DR: A variant of A* search designed to run on the massively parallel, SIMD Connection Machine (CM-2), called PRA* (for Parallel Retraction A*), is designed to maximize use of the Connection Machine′s memory and processors.
••
TL;DR: The chare kernel is a collection of primitive functions that manage chares, manipulate messages, invoke atomic computations, and coordinate concurrent activities that supports parallel computations with irregular structure.
••
TL;DR: A new parallel heuristic is described that on the 32K-processor CM-2 Connection Machine handles graphs with more than two million edges and gives in 9-min partitions that are within 2% of the best ever found.
••
TL;DR: This paper proposes a method called the vertically layered allocation scheme which utilizes heuristic rules in finding a compromise between computation and communication costs in a static data flow environment.
••
TL;DR: An optimal algorithm for performing the communication described by exchanging the bits of the node address with that of the local address is described, typically in both matrix transposition and bit reversal for the fast Fourier transform.
••
TL;DR: This work presents a formal solution to the problem of guaranteeing serializable behavior in synchronous parallel production systems that execute many rules simultaneously, and presents a variety of algorithms that implement this solution.
••
TL;DR: Methods for embedding one-, two-, and three-dimensional mesh of trees in the hypercube are described, which have significant practical importance in enhancing the capabilities of thehypercube.
••
TL;DR: It is observed that for nonuniform images uniform partitioning does not perform well, whereas static and dynamic partitioning strategies perform well and comparably in most cases.
••
TL;DR: An efficient multiprocessor algorithm to merge m, m ⩾ 2, sorted lists containing N elements is described, which substantially reduces the data access costs in comparison with traditional schemes that successively merge the lists two at a time.
••
TL;DR: Under certain workload assumptions, results show that placement algorithms that are strongly biased toward local frame allocation but are able to borrow remote frames can reduce the number of page faults over strictly local allocation.
••
TL;DR: The design of a benchmark is presented, SLALOM{trademark}, that scales automatically to the computing power available, and corrects several deficiencies in various existing benchmarks: it is highly scalable, it solves a real problem, it includes input and output times, and it can be run on parallel machines of all kinds, using any convenient language.
••
TL;DR: This work gives an efficient algorithm to find the minimum-cost way to evaluate an expression, for several different data parallel architectures, and applies to any architecture in which the metric describing the cost of moving an array has a property the authors call “robustness".
••
TL;DR: This work examines design alternatives for ordered radix-2 DIF (decimation-in-frequency) FFT algorithms on massively parallel hypercube multiprocessors such as the Connection Machine and combines the order and computational phases of the FFT and also uses sequence to processor maps that reduce communication.
••
TL;DR: It is argued that multiprocessors based on a fast global control unit capable of fast execution of serial code, and capable of managing an ensemble of slower processors, offer a performance/ cost ratio significantly better than any comparable homogeneous multipROcessor with distributed control.
••
TL;DR: The proposed processor-efficient parallel algorithm for the 0/1 knapsack problem has optimal time speedup and processor efficiency over the best known sequential algorithm and performs very well for a wide range of input sizes.