scispace - formally typeset
Book ChapterDOI

A Blocking Algorithm for FFT on Cache-Based Processors

Reads0
Chats0
TLDR
The block six-step FFT algorithm improves performance by effectively utilizing the cache memory and is presented as a blocking algorithm for computing large one-dimensional fast Fourier transform (FFT) on cache-based processors.
Abstract
In this paper, we propose a blocking algorithm for computing large one-dimensional fast Fourier transform (FFT) on cache-based processors Our proposed FFT algorithm is based on the six-step FFT algorithm We show that the block six-step FFT algorithm improves performance by effectively utilizing the cache memory Performance results of one-dimensional FFTs on the Sun Ultra 10 and PentiumIII PC are reported We succeeded in obtaining performance of about 108MFLOPS on the Sun Ultra 10 (UltraSPARC-IIi 333MHz) and about 247MFLOPS on the 1GHz PentiumIII PC for 220-point FFT

read more

Citations
More filters
Journal ArticleDOI

The Design and Implementation of FFTW3

TL;DR: It is shown that such an approach can yield an implementation of the discrete Fourier transform that is competitive with hand-optimized libraries, and the software structure that makes the current FFTW3 version flexible and adaptive is described.
Journal ArticleDOI

A new incompressible Navier-Stokes solver combining Fourier pseudo-spectral and immersed boundary methods

TL;DR: In this article, a new numerical methodology combining Fourier pseudo-spectral and immersed boundary methods is developed for fluid flow prob- lems governed by the incompressible Navier-Stokes equations.

Tuning hardware and software for multiprocessors

TL;DR: This work describes communication-avoiding algorithms and highly optimized implementations of a sparse linear algebra kernel called "matrix powers" and demonstrates co-tuning, which improves hardware area and power efficiency by up to 3× and 2.4× respectively.
Journal ArticleDOI

Atomistic Modeling of Ultrathin Surface Oxide Growth on a Ternary Alloy: Oxidation of Al−Ni−Fe

TL;DR: In this article, the surface oxide film growth on aluminum−nickel−iron alloys has been studied at 300 and 600 K. The dynamics of oxidation and oxide growth is strongly d...
Book ChapterDOI

A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs

TL;DR: A blocking algorithm for a parallel one-dimensional fast Fourier transform (FFT) on clusters of PCs based on the six-step FFT algorithm, which achieves performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
References
More filters
Book

Computational Frameworks for the Fast Fourier Transform

TL;DR: The Radix-2 Frameworks, a collection of general and high performance FFTs designed to solve the multi-Dimensional FFT problem of Prime Factor and Convolution, are presented.
ReportDOI

The Fastest Fourier Transform in the West

TL;DR: FFTW is typically faster than all other publicly available DFT software, including the well-known FFTPACK and the code from Numerical Recipes, and is competitive with or better than proprietary, highly-tuned codes such as Sun's PerformanceLibrary and IBM'sESSL library.
Journal ArticleDOI

FFTs in external or hierarchical memory

TL;DR: Advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory that require as few as two passes through the external data set, employ strictly unit stride, long vector transfers between main memory and external storage, and are well suited for vector and parallel computation are described.
Journal ArticleDOI

High Performance FFT Algorithms for Cache-Coherent Multiprocessors

TL;DR: Efficient algorithms for out-of-cache one-dimensional fast Fourier transforms on microprocessors and a natural parallelism are developed.