The Design and Implementation of FFTW3
read more
Citations
QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials
Quantum ESPRESSO: a modular and open-source software project for quantum simulations of materials
Evidence for wavelike energy transfer through quantum coherence in photosynthetic systems
The 6dF Galaxy Survey: baryon acoustic oscillations and the local Hubble constant
The Landscape of Parallel Computing Research: A View from Berkeley
References
Introduction to Algorithms
Numerical Recipes in C: The Art of Scientific Computing
An algorithm for the machine calculation of complex Fourier series
The MD5 Message-Digest Algorithm
Related Papers (5)
Frequently Asked Questions (15)
Q2. What format is used to store the data in a 2-D matrix?
An 2-D matrix is typically stored in C using row-major format: size- contiguous arrays for each row, stored as consecutive blocks starting from a pointer (for input/output).
Q3. What is the basic idea behind the Cooley–Tukey FFT?
The basic idea behind this FFT is that a DFT of a composite size can be reexpressed in terms of smaller DFTs of sizes and —essentially, as a 2-D DFT of size where the output is transposed.
Q4. How can one decompose a trigonometric transform?
One can also compute symmetric DFTs by directly specializing the Cooley–Tukey algorithm, removing redundant operations as the authors did for real inputs, to decompose the transform into smaller symmetric transforms [53], [56], [57].
Q5. What is the way to solve a problem of arbitrary vector rank?
one must reduce a problem of arbitrary vector rank to a set of loops nested around a problem of vector rank 0, i.e., a single (possibly multidimensional) DFT.
Q6. What is the function that creates the plan?
When invoked by the planner, a solver creates the plan for the given problem (if possible) and it initializes any auxiliary data required by the plan (such as twiddle factors).
Q7. how many times does a codelet execute two iterations of its loop in parallel?
On machines that support vectors of length 4, the authors view SIMD data as vectors of two complex numbers, and each codelet executes two iterations of its loop in parallel.
Q8. Why does the planner have to represent a problem?
Because only problems that can be expressed can be solved, the representation of a problem determines an upper bound to the space of plans that the planner can explore; therefore, it ultimately constrains FFTW’s performance.
Q9. What is the advantage of FFTW on the Pentium IV?
FFTW is the only library that exploits SIMD instructions for nonpower-of-two sizes, which gives it an advantage on the Pentium IV for this case.
Q10. Why is the answer to this question constantly changing?
because computer hardware is continually changing, the answer to this question has been continually changing as well.
Q11. What is the penalty for a 1024 1024 2-D complex-data transform?
In other cases, however, the penalty from impatient mode can be larger; for example, it has a 47% penalty for a 1024 1024 2-D complex-data transform on the same machine, since vector recursion proves important there for the discontiguous (row) dimension of the transform.
Q12. What is the effect of vector reordering?
)2) Vector Recursion: Another example of the effect of loop reordering is a style of plan that the authors sometimes call vector recursion (unrelated to “vector-radix” FFTs [16]).
Q13. What is the key difficulty in implementing the Cooley–Tukey FFT?
(On the other end of the scale, a “radix” of roughly has been called a four-step FFT [18], and the authors have found that one step of such a radix can be useful for large sizes in FFTW; see Section IV-D1.)A key difficulty in implementing the Cooley–Tukey FFT is that the dimension corresponds to discontiguous inputsin but contiguous outputs in , and vice versa for .
Q14. What is the way to solve the type-IV trigonometric transform?
For the type-I DCT/DST, however, the authors could not find any accurate algorithm to re-express the transform in terms of an equal-length real-input DFT; thus, the authors resort to the “slow” method of embedding it in a real-input DFT of length .
Q15. What is the way to reduce planner time relative to their observed benefits?
the planner can operate in an impatient mode that reduces the space of plans by eliminating some possibilities that appear to inordinately increase planner time relative to their observed benefits.