scispace - formally typeset
Search or ask a question

Showing papers by "Thomas H. Cormen published in 1997"


Proceedings ArticleDOI
17 Nov 1997
TL;DR: In this paper, the authors extended an earlier out-of-core Fast Fourier Transform (FFT) method for a uniprocessor with the Parallel Disk Model (PDM) to use multiple processors.
Abstract: This paper extends an earlier out-of-core Fast Fourier Transform (FFT) method for a uniprocessor with the Parallel Disk Model (PDM) to use multiple processors. Four out-of-core multiprocessor methods are examined. Operationally, these methods di er in the size of \minibutter y" computed in memory and how the data are organized on the disks and in the distributed memory of the multiprocessor. The methods also perform di ering amounts of I/O and communication. Two of them have the remarkable property that even though they are computing the FFT on a multiprocessor, all interprocessor communication occurs outside the mini-butter y computations. Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butter y computations require approximately 86% of the time of those that do. Moreover, the faster methods are much easier to implement.

23 citations


Proceedings Article
01 Jan 1997
TL;DR: Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butter y computations require approximately 86% of the time of those that do.

16 citations


Journal ArticleDOI
01 Dec 1997
TL;DR: Approaches based on minimizing I/O costs with the Parallel Disk Model (PDM) are presented, each of these approaches explicitly plans and performs disk accesses so as to minimize their number.
Abstract: We examine approaches to computing the Fast Fourier Transform (FFT) when the data size exceeds the size of main memory. Analytical and experimental evidence shows that relying on native virtual memory with demand paging can yield extremely poor performance. We then present approaches based on minimizing I/O costs with the Parallel Disk Model (PDM). Each of these approaches explicitly plans and performs disk accesses so as to minimize their number.

12 citations


Book ChapterDOI
01 Jul 1997
TL;DR: An out-of-core FFT algorithm based on the in- core FFT method developed by Swarztrauber is presented and it is shown how to use dynamic programming to determine optimal splits at each recursive stage.
Abstract: We present an out-of-core FFT algorithm based on the in-core FFT method developed by Swarztrauber. Our algorithm uses a recursive divide-and-conquer strategy, and each stage in the recursion presents several possibilities for how to split the problem into subproblems. We give a recurrence for the algorithm’s I/O complexity on the Parallel Disk Model and show how to use dynamic programming to determine optimal splits at each recursive stage. The algorithm to determine the optimal splits takes only Θ(lg2 N) time for an N-point FFT, and it is practical. The out-of-core FFT algorithm itself takes considerably longer.

11 citations


Journal ArticleDOI
TL;DR: An architecture-independent method for performing BMMC permutations on multiprocessors with distributed memory that transmits only data without transmitting any source or target indices, which conserves network bandwidth.
Abstract: This paper presents an architecture-independent method for performing BMMC permutations on multiprocessors with distributed memory. All interprocessor communication uses the MPI function MPI_Sendrecv_replace(). The number of elements and number of processors must be powers of 2, with at least one element per processor, and there is no inherent upper bound on the ratio of elements per processor. Our method transmits only data without transmitting any source or target indices, which conserves network bandwidth. When data is transmitted, the source and target processors implicitly agree on each other''s identity and the indices of the elements being transmitted. A C-callable implementation of our method is available from Netlib. The implementation allows preprocessing (which incurs a modest cost) to be factored out for multiple runs of the same permutation, even if on different data. Data may be laid out in any one of several ways: processor-major, processor-minor, or anything in between.

4 citations