scispace - formally typeset
Search or ask a question

Showing papers by "Thomas H. Cormen published in 1996"



Journal ArticleDOI
01 Sep 1996
TL;DR: This paper analyzes timing results on a uniprocessor with several disks for two PDM algorithms, out-of-core radix sort and BMMC permutations, to determine the strengths and weaknesses of the PDM.
Abstract: Although several algorithms have been developed for the Parallel Disk Model (PDM), few have been implemented. Consequently, little has been known about the accuracy of the PDM in measuring I/O time and total time to perform an out-of-core computation. This paper analyzes timing results on a uniprocessor with several disks for two PDM algorithms, out-of-core radix sort and BMMC permutations, to determine the strengths and weaknesses of the PDM. The results indicate the following. First, good PDM algorithms are usually not I/O bound. Second, of the four PDM parameters, two (problem size and memory size) are good indicators of I/O time and running time, but the other two (block size and number of disks) are not. Third, because PDM algorithms tend not to be I/O bound, asynchronous I/O effectively hides I/O times. The software interface to the PDM is part of the ViC* run-time library. The interface is a set of wrappers that are designed to be both efficient and portable across several parallel file systems and target machines.

36 citations


01 Jan 1996
TL;DR: This thesis defines in-place algorithms for permuting out-of-core data and shows how to efficiently perform BMMC, mesh, and torus permutations in place and defines an application user interface (API) for performing parallel data accesses in the manner suggested by Vitter and Shriver's Parallel Disk Model (PDM).
Abstract: For many scientific applications, the data set cannot entirely fit in main memory. The data must reside out-of-core, i.e., on parallel disks. For many basic data-movement operations such as permuting, if the programmer does not design efficient algorithms for accessing data on the parallel disks, the penalty can be quite severe. As processor speeds increase more rapidly than disk speeds, additional disk accesses become even more costly in a relative sense. In this thesis, we focus on efficiently performing the data movement for permutations when the data reside out-of-core. We present a unified approach for performing bit-matrix-multiply/complement (BMMC) permutations at different levels of the memory hierarchy. Our unified approach uses a linear-algebraic technique to decompose an arbitrary BMMC permutation into a number of permutations which we know how to perform efficiently. We show that this technique is flexible enough to apply at the following levels of memory abstraction: parallel disk access, interprocessor communication on distributed memory multiprocessors (with the processors connected by either a mesh or a multistage network), uniprocessor memory access, and the design of combinational circuits. The BMMC permutations include commonly used permutations such as matrix transposition, bit-reversal permutations (used in performing FFTs), vector-reversal permutations, hypercube permutations, matrix reblocking, and permutations used by fast cosine transforms (FCTs). This thesis presents additional work at the parallel disk level of abstraction. We define in-place algorithms for permuting out-of-core data and show how to efficiently perform BMMC, mesh, and torus permutations in place. We also define an application user interface (API) for performing parallel data accesses in the manner suggested by Vitter and Shriver's Parallel Disk Model (PDM).

6 citations


Proceedings ArticleDOI
27 Mar 1996
TL;DR: This paper examines routing of BMMC (bit-matrix-multiply/complement) permutations on two types of multistage interconnection networks: the expanded delta network and the global router of the MasPar MP-2.
Abstract: This paper examines routing of BMMC (bit-matrix-multiply/complement) permutations on two types of multistage interconnection networks: the expanded delta network and the global router of the MasPar MP-2. BMMC permutations are an important class of permutations that has been well-studied on various multistage networks. The class of BMMC permutations includes as subclasses Gray-code and inverse Gray-code permutations and the entire subclass of bit-permute/complement (BPC) permutations, which in turn includes matrix transpose (with power-of-2 dimensions), bit reversal, vector reversal, hypercube, and matrix reblocking permutations. There are four results in this paper. First, we use linear-algebraic techniques to derive an algorithm to perform any BMMC permutation in at most two passes on the expanded delta network. Second, we use linear-algebraic techniques to derive an algorithm to perform any BMMC permutation in at most two passes on the global router of the MasPar MP-2. Third, we use linear-algebraic and combinatorial analysis to determine the distribution of all BMMC permutations when routed naively through the MP-2 global router and show that most, but not all, BMMC permutations require only one or two passes anyway. We can apply our two-pass algorithms in those cases when naive routing requires more than two passes. Fourth, we present experimental evidence to support our analysis.

3 citations