Matrix algorithms on a hypercube I: Matrix multiplication☆

doi:10.1016/0167-8191(87)90060-3

Journal ArticleDOI

Matrix algorithms on a hypercube I: Matrix multiplication☆

- Vol. 4, Iss: 1, pp 17-31

TLDR

In this paper, the authors discuss algorithms for matrix multiplication on a concurrent processor containing a two-dimensional mesh or richer topology, and present detailed performance measurements on hypercubes with 4, 16, and 64 nodes.

Abstract:

We discuss algorithms for matrix multiplication on a concurrent processor containing a two-dimensional mesh or richer topology. We present detailed performance measurements on hypercubes with 4, 16, and 64 nodes, and analyze them in terms of communication overhead and load balancing. We show that the decomposition into square subblocks is optimal C code implementing the algorithms is available.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

PVM: a framework for parallel distributed computing

Vaidy S. Sunderam

- 01 Nov 1990 -

Concurrency and Computation: Practice an...

TL;DR: The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components that operate on a collection of heterogeneous computing elements interconnected by one or more networks.

...read moreread less

Journal ArticleDOI

Properties and performance of folded hypercubes

A. El-Amawy, +1 more

- 01 Jan 1991 -

IEEE Transactions on Parallel and Distri...

TL;DR: A new hypercube-type structure, the folded hypercube (FHC), which is basically a standard hypercube with some extra links established between its nodes, is proposed and analyzed and it is shown that this structure offers substantial improvement over existing hyper cube-type networks in terms of the above-mentioned network parameters.

...read moreread less

Book

Beowulf Cluster Computing with Linux

Thomas Sterling, +2 more

TL;DR: The second edition of Beowulf Cluster Computing with Linux has been completely updated; all three stand-alone sections have important new material.

...read moreread less

Journal ArticleDOI

Communication lower bounds for distributed-memory matrix multiplication

Dror Irony, +2 more

- 01 Sep 2004 -

Journal of Parallel and Distributed Comp...

TL;DR: Lower bounds on the amount of communication that matrix multiplication algorithms must perform on a distributed-memory parallel computer are presented and it is shown that in any algorithm that uses O(n2/P2/3) words of memory per processor, at least one processor must send or receive Ω(n 2/P1/2) words.

...read moreread less

Posted Content

"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Sanghamitra Dutta, +2 more

- 18 Apr 2017 -

arXiv: Information Theory

TL;DR: In this paper, the authors propose a technique called Short-Dot to reduce the number of redundant computations in a coding theory inspired fashion for computing linear transforms of long vectors.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Algorithms for concurrent processors

Geoffrey C. Fox, +1 more

- 01 May 1984 -

Physics Today

TL;DR: The authors are on the verge of a revolution in computing, spawned by advances in computer technology that will make it practical to build very‐high‐performance computers, or “supercomputers,” consisting of very many small computers combined to form a single concurrent processor.

...read moreread less

Proceedings Article

The Mark III Hypercube-Ensemble Concurrent Computer.

John C. Peterson, +3 more

Parallel Cholesky factorization on a hypercube multiprocessor

G.A. Geist, +1 more

TL;DR: Two types of message-passing parallel algorithms are developed for solving symmetric systems of linear equations on a hypercube multiprocessor that involve broadcast communication among processors and communication along a ring of processors.

...read moreread less

Journal ArticleDOI

Pure Gauge SU(3) Lattice Theory on an Array of Computers

E. Brooks, +9 more

- 25 Jun 1984 -

Physical Review Letters

TL;DR: The availability of a substantial number of computer cycles, coupled with an improvement in the algorithm, made possible a high-statistics determination of the heavy-quark potential on a 12/sup 3/ x 16 lattice.

...read moreread less

Proceedings ArticleDOI

Dense matrix operations on a torus and a boolean cube

S. Lennart Johnsson

TL;DR: Algorithms for matrix multiplication and for Gauss-Jordan and Gaussian elimination on dense matrices on a torus and a boolean cube are presented and analyzed with respect to communication and arithmetic complexity.

...read moreread less

Matrix algorithms on a hypercube I: Matrix multiplication☆

Citations

PVM: a framework for parallel distributed computing

Properties and performance of folded hypercubes

Beowulf Cluster Computing with Linux

Communication lower bounds for distributed-memory matrix multiplication

"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

References

Algorithms for concurrent processors

The Mark III Hypercube-Ensemble Concurrent Computer.

Parallel Cholesky factorization on a hypercube multiprocessor

Pure Gauge SU(3) Lattice Theory on an Array of Computers

Dense matrix operations on a torus and a boolean cube

Related Papers (5)

A cellular computer to implement the kalman filter algorithm

SUMMA: Scalable Universal Matrix Multiplication Algorithm

Parallel Matrix and Graph Algorithms

Solving Problems on Concurrent Processors

Gaussian elimination is not optimal