Search or ask a question

Showing papers by "Thomas H. Cormen published in 1997"

PDF

Open Access

Proceedings Article•DOI•

Multiprocessor out-of-core FFTs with distributed memory and parallel disks (extended abstract)

[...]

Thomas H. Cormen¹, Jake Wegmann¹, David M. Nicol¹•Institutions (1)

Dartmouth College¹

17 Nov 1997

TL;DR: In this paper, the authors extended an earlier out-of-core Fast Fourier Transform (FFT) method for a uniprocessor with the Parallel Disk Model (PDM) to use multiple processors.

...read moreread less

Abstract: This paper extends an earlier out-of-core Fast Fourier Transform (FFT) method for a uniprocessor with the Parallel Disk Model (PDM) to use multiple processors. Four out-of-core multiprocessor methods are examined. Operationally, these methods di er in the size of \minibutter y" computed in memory and how the data are organized on the disks and in the distributed memory of the multiprocessor. The methods also perform di ering amounts of I/O and communication. Two of them have the remarkable property that even though they are computing the FFT on a multiprocessor, all interprocessor communication occurs outside the mini-butter y computations. Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butter y computations require approximately 86% of the time of those that do. Moreover, the faster methods are much easier to implement.

...read moreread less

23 citations

Proceedings Article•

Multiprocessor Out-of-Core FFTs with Distributed Memory and Parallel Disks

[...]

Thomas H. Cormen¹, Jake Wegmann¹, David M. Nicol¹•Institutions (1)

Dartmouth College¹

01 Jan 1997

TL;DR: Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butter y computations require approximately 86% of the time of those that do.

...read moreread less

16 citations

Journal Article•DOI•

Out-of-core FFTs with parallel disks

[...]

Thomas H. Cormen¹, David M. Nicol¹•Institutions (1)

Dartmouth College¹

01 Dec 1997

TL;DR: Approaches based on minimizing I/O costs with the Parallel Disk Model (PDM) are presented, each of these approaches explicitly plans and performs disk accesses so as to minimize their number.

...read moreread less

Abstract: We examine approaches to computing the Fast Fourier Transform (FFT) when the data size exceeds the size of main memory. Analytical and experimental evidence shows that relying on native virtual memory with demand paging can yield extremely poor performance. We then present approaches based on minimizing I/O costs with the Parallel Disk Model (PDM). Each of these approaches explicitly plans and performs disk accesses so as to minimize their number.

...read moreread less

12 citations

Book Chapter•DOI•

Determining an Out-of-Core FFT Decomposition Strategy for Parallel Disks by Dynamic Programming

[...]

Thomas H. Cormen¹•Institutions (1)

Dartmouth College¹

01 Jul 1997

TL;DR: An out-of-core FFT algorithm based on the in- core FFT method developed by Swarztrauber is presented and it is shown how to use dynamic programming to determine optimal splits at each recursive stage.

...read moreread less

Abstract: We present an out-of-core FFT algorithm based on the in-core FFT method developed by Swarztrauber. Our algorithm uses a recursive divide-and-conquer strategy, and each stage in the recursion presents several possibilities for how to split the problem into subproblems. We give a recurrence for the algorithm’s I/O complexity on the Parallel Disk Model and show how to use dynamic programming to determine optimal splits at each recursive stage. The algorithm to determine the optimal splits takes only Θ(lg2 N) time for an N-point FFT, and it is practical. The out-of-core FFT algorithm itself takes considerably longer.

...read moreread less

11 citations

Journal Article•DOI•

Performing BMMC Permutations Efficiently on Distributed-Memory Multiprocessors with MPI

[...]

Thomas H. Cormen¹•Institutions (1)

Dartmouth College¹

01 May 1997-Algorithmica

TL;DR: An architecture-independent method for performing BMMC permutations on multiprocessors with distributed memory that transmits only data without transmitting any source or target indices, which conserves network bandwidth.

...read moreread less

Abstract: This paper presents an architecture-independent method for performing BMMC permutations on multiprocessors with distributed memory. All interprocessor communication uses the MPI function MPI_Sendrecv_replace(). The number of elements and number of processors must be powers of 2, with at least one element per processor, and there is no inherent upper bound on the ratio of elements per processor. Our method transmits only data without transmitting any source or target indices, which conserves network bandwidth. When data is transmitted, the source and target processors implicitly agree on each other''s identity and the indices of the elements being transmitted. A C-callable implementation of our method is available from Netlib. The implementation allows preprocessing (which incurs a modest cost) to be factored out for multiple runs of the same permutation, even if on different data. Data may be laid out in any one of several ways: processor-major, processor-minor, or anything in between.

...read moreread less

4 citations