scispace - formally typeset
Journal ArticleDOI

Matrix algorithms on a hypercube I: Matrix multiplication☆

TLDR
In this paper, the authors discuss algorithms for matrix multiplication on a concurrent processor containing a two-dimensional mesh or richer topology, and present detailed performance measurements on hypercubes with 4, 16, and 64 nodes.
Abstract
We discuss algorithms for matrix multiplication on a concurrent processor containing a two-dimensional mesh or richer topology. We present detailed performance measurements on hypercubes with 4, 16, and 64 nodes, and analyze them in terms of communication overhead and load balancing. We show that the decomposition into square subblocks is optimal C code implementing the algorithms is available.

read more

Citations
More filters
Journal ArticleDOI

PVM: a framework for parallel distributed computing

TL;DR: The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components that operate on a collection of heterogeneous computing elements interconnected by one or more networks.
Journal ArticleDOI

Properties and performance of folded hypercubes

TL;DR: A new hypercube-type structure, the folded hypercube (FHC), which is basically a standard hypercube with some extra links established between its nodes, is proposed and analyzed and it is shown that this structure offers substantial improvement over existing hyper cube-type networks in terms of the above-mentioned network parameters.
Book

Beowulf Cluster Computing with Linux

TL;DR: The second edition of Beowulf Cluster Computing with Linux has been completely updated; all three stand-alone sections have important new material.
Journal ArticleDOI

Communication lower bounds for distributed-memory matrix multiplication

TL;DR: Lower bounds on the amount of communication that matrix multiplication algorithms must perform on a distributed-memory parallel computer are presented and it is shown that in any algorithm that uses O(n2/P2/3) words of memory per processor, at least one processor must send or receive Ω(n 2/P1/2) words.
Posted Content

"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

TL;DR: In this paper, the authors propose a technique called Short-Dot to reduce the number of redundant computations in a coding theory inspired fashion for computing linear transforms of long vectors.
References
More filters
Journal ArticleDOI

Algorithms for concurrent processors

Geoffrey C. Fox, +1 more
- 01 May 1984 - 
TL;DR: The authors are on the verge of a revolution in computing, spawned by advances in computer technology that will make it practical to build very‐high‐performance computers, or “supercomputers,” consisting of very many small computers combined to form a single concurrent processor.

Parallel Cholesky factorization on a hypercube multiprocessor

TL;DR: Two types of message-passing parallel algorithms are developed for solving symmetric systems of linear equations on a hypercube multiprocessor that involve broadcast communication among processors and communication along a ring of processors.
Journal ArticleDOI

Pure Gauge SU(3) Lattice Theory on an Array of Computers

TL;DR: The availability of a substantial number of computer cycles, coupled with an improvement in the algorithm, made possible a high-statistics determination of the heavy-quark potential on a 12/sup 3/ x 16 lattice.
Proceedings ArticleDOI

Dense matrix operations on a torus and a boolean cube

TL;DR: Algorithms for matrix multiplication and for Gauss-Jordan and Gaussian elimination on dense matrices on a torus and a boolean cube are presented and analyzed with respect to communication and arithmetic complexity.