scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 1987"


Journal ArticleDOI
TL;DR: A new implementation of the real-valued split-radix FFT is presented, an algorithm that uses fewer operations than any otherreal-valued power-of-2-length FFT.
Abstract: This tutorial paper describes the methods for constructing fast algorithms for the computation of the discrete Fourier transform (DFT) of a real-valued series. The application of these ideas to all the major fast Fourier transform (FFT) algorithms is discussed, and the various algorithms are compared. We present a new implementation of the real-valued split-radix FFT, an algorithm that uses fewer operations than any other real-valued power-of-2-length FFT. We also compare the performance of inherently real-valued transform algorithms such as the fast Hartley transform (FHT) and the fast cosine transform (FCT) to real-valued FFT algorithms for the computation of power spectra and cyclic convolutions. Comparisons of these techniques reveal that the alternative techniques always require more additions than a method based on a real-valued FFT algorithm and result in computer code of equal or greater length and complexity.

489 citations


Book
01 Jan 1987
TL;DR: This book discusses the Discrete Fourier Transform (DFT) and a few applications of the DFT, as well as some of the techniques used in real sequences and the Real DFT.
Abstract: Preface 1. Introduction. A Bit of History An Application Problems 2. The Discrete Fourier Transform (DFT). Introduction DFT Approximation to the Fourier Transform The DFT-IDFT pair DFT Approximations to Fourier Series Coefficients The DFT from Trigonometric Approximation Transforming a Spike Train Limiting Forms of the DFT-IDFT Pair Problems 3. Properties of the DFT. Alternate Forms for the DFT Basic Properties of the DFT Other Properties of the DFT A Few Practical Considerations Analytical DFTs Problems 4. Symmetric DFTs. Introduction Real sequences and the Real DFT (RDFT) Even Sequences and the Discrete Cosine Transform (DST) Odd Sequences and the Discrete Sine Transform (DST) Computing Symmetric DFTs Notes Problems 5. Multi-dimensional DFTs. Introduction Two-dimensional DFTs Geometry of Two-Dimensional Modes Computing Multi-Dimensional DFTs Symmetric DFTs in Two Dimensions Problems 6. Errors in the DFT. Introduction Periodic, Band-limited Input Periodic, Non-band-limited Input Replication and the Poisson Summation Formula Input with Compact Support General Band-Limited Functions General Input Errors in the Inverse DFT DFT Interpolation - Mean Square Error Notes and References Problems 7. A Few Applications of the DFT. Difference Equations - Boundary Value Problems Digital Filtering of Signals FK Migration of Seismic Data Image Reconstruction from Projections Problems 8. Related Transforms. Introduction The Laplace Transform The z- Transform The Chebyshev Transform Orthogonal Polynomial Transforms The Discrete Hartley Transform (DHT) Problems 9. Quadrature and the DFT. Introduction The DFT and the Trapezoid Rule Higher Order Quadrature Rules Problems 10. The Fast Fourier Transform (FFT). Introduction Splitting Methods Index Expansions (One ---> Multi-dimensional) Matrix Factorizations Prime Factor and Convolution Methods FFT Performance Notes Problems Glossary of (Frequently and Consistently Used) Notations References.

354 citations


Journal ArticleDOI
D.M.W. Evans1
TL;DR: An elegant algorithm has been found that performs this "perfect shuffle" more efficiently and, according to timing experiments, runs about eight times faster than the fastest other algorithm known to the author.
Abstract: All radix-B fast Fourier transforms (FFT) or fast Hartley transforms (FHT) performed "in-place" require at some point that the sequence elements he permuted such that, indexing the elements 0 to N - 1, the element with index i is swapped with the element whose index is j. The permutation is called digit-reversing, because if i is represented as a string of digits, base B, then j is that index whose representation is the same string of digits written in reverse order. N is a power of B and B \geq 2 . An elegant algorithm has been found that Performs this "perfect shuffle" more efficiently and, according to timing experiments, runs about eight times faster than the fastest other algorithm known to the author. The algorithm is of order O(N) and led, for example, to a saving of 7 percent in the total (radix-2) FFT running time for N = 1024.

59 citations


Journal ArticleDOI
TL;DR: The author's own involvement and experience with the FFT algorithm is described, which led to an unfolding of its pre-electronic computer history going back to Gauss.
Abstract: The discovery of the fast Fourier transform (FFT) algorithm and the subsequent development of algorithmic and numerical methods based on it have had an enormous impact on the ability of computers to process digital representations of signals, or functions. At first, the FFT was regarded as entirely new. However, attention and wide publicity led to an unfolding of its pre-electronic computer history going back to Gauss. The present paper describes the author's own involvement and experience with the FFT algorithm.

54 citations


Journal ArticleDOI
01 Feb 1987
TL;DR: A three-dimensional (3-D) Discrete Fourier Transform (DFT) algorithm for real data using the one-dimensional Fast Hartley Transform (FHT) is introduced that is simpler and retains the speed advantage that is characteristic of the Hartley approach.
Abstract: A three-dimensional (3-D) Discrete Fourier Transform (DFT) algorithm for real data using the one-dimensional Fast Hartley Transform (FHT) is introduced. It requires the same number of one-dimensional transforms as a direct FFT approach but is simpler and retains the speed advantage that is characteristic of the Hartley approach. The method utilizes a decomposition of the cas function kernel of the Hartley transform to obtain a temporary transform, which is then corrected by some additions to yield the 3-D DFT. A Fortran subroutine is available on request.

45 citations


Journal ArticleDOI
R.C. Agarwal1, J.W. Cooley
01 Sep 1987
TL;DR: The algorithm formulation and implementation described here not only achieves full vector utilization but successfully copes with the problems of hierarchical storage.
Abstract: A number of previous attempts at the vectorization of the fast Fourier transform (FFT) algorithm have fallen somewhat short of achieving the full potential speed of vector processors. The algorithm formulation and implementation described here not only achieves full vector utilization but successfully copes with the problems of hierarchical storage. In the present paper, these techniques are described and extended to the general mixed radix algorithms, prime factor algorithm (PFA), the multidimensional discrete Fourier transform (DFT), the rectangular transform convolution algorithms, and the Winograd fast Fourier transform algorithm. Some of the methods were used in the Engineering Scientific Subroutine Library for the IBM 3090 Vector Facility. Using this approach, very good and consistent performance was obtained over a very wide range of transform lengths.

42 citations


Journal ArticleDOI
TL;DR: In this paper, the numerical inversion of Laplace transforms by means of the finite Fourier cosine transform, as presented by Dubner and Abate, was analyzed, and it was found that the proper inversion formula should contain the Fourier sine series as well.

40 citations


Patent
14 Sep 1987
TL;DR: In this article, a fast Fourier transform circuit, including an illustrative radix-eight DFT kernel that operates on an n-bit-serial data format, for an efficient serial-like, pipelined operation within the DFT.
Abstract: A fast Fourier transform circuit, including an illustrative radix-eight discrete Fourier transform (DFT) kernel that operates on an n-bit-serial data format, for an efficient serial-like, pipelined operation within the DFT. The circuit performs a four-point DFT on half of the input data words at a time, stores intermediate results from the four-point DFT in a commutation stage, then combines the intermediate results in two two-point DFTs. Internal multiplication in the eight-point DFT is effected in delay registers that also serve to store the intermediate results, thereby providing an economy of timing and circuit routing. Interleaving and deinterleaving operations convert the data format between three-bit-serial and conventional bit-parallel used outside the eight-point DFT kernel, which may therefore be easily cascaded for more complex FFT operations. The DFT kernel also includes means for selectively bypassing butterfly computation modules to perform shorter-length DFTs.

30 citations


Journal ArticleDOI
TL;DR: A novel processor for the implementation of multiplierless FFT's in VLSI with the capability of achieving a 40 MHz throughput rate for a 1024-point FFT using 20 processing IC's is presented.
Abstract: This paper presents a novel processor for the implementation of multiplierless FFT's in VLSI. The arithmetic scheme is specially tailored for the simple binary coefficients used for these FFT's, which make multiplication trivial. (The class of coefficients dealt with are those that have a maximum of 2 nonzero digits; i.e., sum of 2 integers powers of 2 with each power in the range 0-4.) A single chip processing element for a 4-point DFT (for a radix 4 FFT) with an execution time of 400 ns using a 10 MHz clock has been realized. The chip has an estimated maximum gate count of 11 000 and pin count of 85. It has the capability of achieving a 40 MHz throughput rate for a 1024-point FFT using 20 processing IC's. The use of the 4-point chip to implement higher radix algorithms and various other issues are discussed.

22 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: An algorithm for evaluating the Discrete Fourier Transform at particular output frequency is derived using a technique called summation by parts (SBP), which is shown to reduce the number of multiplications and the numbers of bits per multiplicative coefficient needed to implement the DFT.
Abstract: An algorithm for evaluating the Discrete Fourier Transform (DFT) at particular output frequency is derived using a technique called summation by parts (SBP). This technique is shown to reduce the number of multiplications and the number of bits per multiplicative coefficient needed to implement the DFT. For many transform lengths, only two one-bit multiplications or simple memory shifts are needed to implement the DFT. When the DFT length is prime, a SBP algorithm designed for a fixed output frequency index can be used to evaluate the DFT at any other non-zero output frequency index simply by appropriately changing the order of the input sequence.

22 citations


Journal ArticleDOI
TL;DR: The results indicate that the FFT implemented with the logarithmic number system provides better signal-to-noise performance than that implemented with a fixed-point or floating-point number system.
Abstract: When a fast Fourier transform (FFT) is implemented on a digital computer or with special-purpose hardware, quantization errors will arise due to finite word lengths in the digital system. This correspondence presents an analysis of error accumulation due to coefficient rounding in the FFT implemented with a logarithmic number system. The theoretical result of the coefficient roundoff error analysis is verified by computer simulations. The results indicate that the FFT implemented with the logarithmic number system provides better signal-to-noise performance than that implemented with a fixed-point or floating-point number system.

DOI
01 Dec 1987
TL;DR: The described transform method, a link between an FFT and a single DFT sum, is compared with the use of separate DFT sums and offers improved accuracy and applicability relative to the existing type of zoom transform.
Abstract: After a brief review of discrete Fourier transformation (DFT) — and fast Fourier transformation (FFT) — properties, two representatives of partial (narrowband) spectrum computation methods, the zoom FFT and a specific type of zoom transform, are introduced. With respect to the FFT, only the zoom transform sufficiently fits demands concerning memory space and computional speed. Moreover, it offers improved accuracy and applicability relative to the existing type of zoom transform. The described transform method, a link between an FFT and a single DFT sum, is compared with the use of separate DFT sums.

Journal ArticleDOI
TL;DR: This correspondence presents details of a new implementation of the prime factor FFT algorithm (PFA) for computing the discrete Fourier transform (DFT) that saves about 40 percent of the execution time of the conventional one.
Abstract: This correspondence presents details of a new implementation of the prime factor FFT algorithm (PFA) for computing the discrete Fourier transform (DFT). This implementation applies a program generation technique to the PFA algorithm and saves about 40 percent of the execution time of the conventional one.

01 Jan 1987
TL;DR: The error-performance of radix-2 decimation-in-time and decimation -in-frequency form of the fast Hartley transform algorithm has been studied and the expressions obtained are similar to those obtained in the case of FFT for the corresponding cases.
Abstract: Fast Hartley transform (FHT) has been proposed recently by Bracewell. This is closely related to the fast Fourier transform (FFT) However, it has two advantages over the FFT, namely, the forward and inverse transforms are the same; and the Hartley transformed outputs are real-valued, rather than complex data, Hence, the speed of computation can be increased by 50% for performing fast convolution or correlation. These properties have led to investigations to use the Hartley transform for time-efficient discrete Fourier analysis of real signals. In this paper, the error-performance of radix-2 decimation-in-time and decimation-in-frequency form of the fast Hartley transform algorithm has been studied. The analysis assumes fixed-point sign magnitude arithmetic. The analysis is carried out for decimation-in-time and decimation-in-frequency form of the fast Hartley transform algorithms, assuming all the errors to be uncorrelated. Then, the analysis is carried out, assuming the truncation errors to be correlated, in the case of decimation-in-frequency form of FHT. The predicted results are compared with computer simulation studies and those obtained in the case of fast Fourier transform. It has been observed that the expressions obtained in the analysis are similar to those obtained in the case of FFT for the corresponding cases.

Journal ArticleDOI
TL;DR: In this article, the split vector radix was used to develop a 2D fast Fourier transform (FFT) algorithm; it was performed "in-place", and required no matrix transpose operation.
Abstract: The split vector radix is used to develop a 2-D fast Fourier transform (FFT) algorithm; it is performed "in-place," and requires no matrix transpose operation. This method greatly improves the conventional vector radix 2-D FFT; an overall saving of about 23 percent in complex multiplications for a typical 2048 \times 2048 array could be obtained.

Journal ArticleDOI
TL;DR: An in-place version of the FFT is presented which takes a real sequence in natural order and produces the transform in scrambled order, which requires half of the operations and storage of the complex algorithms.
Abstract: It has long been known that an in-place version of the Fast Fourier Transform (FFT) exists for real sequences of data. More recently, in-place FFTs have been devised for real sequences with even, odd, or quarter wave symmetries. All of these symmetric FFTs take the input sequence in scrambled (bit-reversed) order and produce the transform sequence in natural order. For many applications, this is the opposite of what is needed, i.e., one would like to provide the input sequence in natural order. In this paper, an in-place version of the FFT is presented which takes a real sequence in natural order and produces the transform in scrambled order. The algorithm requires half of the operations and storage of the complex algorithms. Analogous in-place algorithms are also given for naturally ordered even, odd, quarter wave even and quarter wave odd sequences.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: Split vector radix is used to develop a 2D fast Fourier transform algorithm, it is performed "in-place", and requires no matrix transpose operation, and an overall saving of about 30% complex multiplications for a typical 1024 × 1024 array could be obtained.
Abstract: Split vector radix is used to develop a 2D fast Fourier transform algorithm, it is performed "in-place", and requires no matrix transpose operation; This method greatly improves the conventional vector radix 2D FFT, an overall saving of about 30% complex multiplications for a typical 1024 × 1024 array could be obtained.

Proceedings ArticleDOI
C.S. Burrus1
06 Apr 1987
TL;DR: The new result in this paper is the observation that the Radix-4, radix-8, or any radIX-2mFFT can be modified to give the output in the same bit-reversed order as the radx-2 FFT.
Abstract: The traditional Cooley-Tukey and the prime factor FFT algorithms either produce the output in scrambled order or the input data order must be prescrambled. Several methods for scrambling and unscrambling the DFT are presented. The new result in this paper is the observation that the radix-4, radix-8, or any radix-2mFFT can be modified to give the output in the same bit-reversed order as the radix-2 FFT.

Proceedings ArticleDOI
01 Jan 1987
TL;DR: A new algorithm is derived; the decimation-in-time real-valued split-radix FFT, which can transform any length N = 2Msequence but uses less operations than any other knownReal-valued FFF, which is the fastest Cooley-Tukeyreal-valued transform in use.
Abstract: Since 1965, when Cooley and Tukey published their famous paper on the radix-2 fast Fourier transform, much effort has gone into developing even more efficient algorithms. Most algorithms, however, do not directly handle real-valued data very well, and them exist several ways to solve that problem. This paper derives a new algorithm; the decimation-in-time real-valued split-radix FFT, which can transform any length N = 2Msequence but uses less operations than any other known real-valued FFF, which is the fastest Cooley-Tukey real-valued transform in use. Instead of breaking the transform down equally as in traditional algorithms, the even and odd indexed parts are broken down differently in the split-radix algorithm. This gives a significant savings in both additions and multiplications over any fixed radix Cooley-Tukey FFT. The paper compares the split-radix transform with several of the already existing methods such as the Hartley transform, the prime factor, Winograd, Cooley-Tukey etc, and shows in which cases a specific algorithm is faster than the rest.

Book ChapterDOI
01 Jan 1987
TL;DR: The chapter presents a comparison of number of arithmetic operations required to compute various FFT algorithms and presents a simple, easy to visualize graphical techniques i to specify digital word lengths in a typical spectral analysis system.
Abstract: Publisher Summary The fast Fourier transform (FFT) computes the discrete Fourier transform (DFT) using a reduced number of arithmetic operations as compared to brute-force evaluation of the DFT. The method is efficient, because it eliminates redundancies that result from adding certain data sequence values after they have been multiplied by the same factors of fixed complex constants during the evaluation of different DFT transform coefficients. The efficiency is achieved at the expense of reordering the data sequence and/or transform sequence, but the additional expense is small compared to the reduction in multiplications and additions. The chapter reviews one-dimensional (1-D) and two-dimensional (2-D) DFTs, presents 2-D flow diagrams and the equivalent operations in matrix format, and explains DFT and FFT matrix representation. The first FFTs result from a mixed-radix integer representation (MIR) that includes binary, decimal, and octal integers. The chapter presents a comparison of number of arithmetic operations required to compute various FFT algorithms. Many real-time FFTs are mechanized with dedicated, fixed-point hardware. The chapter also discusses the determination of the number of bits to fully utilize these FFT processors and presents a simple, easy to visualize graphical techniques i to specify digital word lengths in a typical spectral analysis system.

Journal ArticleDOI
TL;DR: The fast-Fourier transform (FFT) which processes a large amount of data, such as image, is considered, as a structure for the one-dimensional FFT processor, the constant-geometry type FFT algorithm and the bit-serial pipeline floating-point arithmetic are discussed.
Abstract: This paper considers the fast-Fourier transform (FFT) which processes a large amount of data, such as image. As a structure for the one-dimensional FFT processor, aiming at eliminating some restrictions in VLSI design, the constant-geometry type FFT algorithm and the bit-serial pipeline floating-point arithmetic are discussed. The major results are as follows. (1) Using the constant-geometry algorithm, the memory elements to perform the rearrangement characteristic to FFT can be realized by a uniform structure and uniform control scheme throughout the stages. The memory element for N-point FFT can be constructed as a simple and regular structure using 2 of N/2-stage shift-registers. (2) The multiplication cell with code extender, the serial structure of normalization circuit and the shifter, and the parallel operation covering a longer length than the input word length are employed. By those schemes, the pipeline operation of floatingpoint arithmetic is realized without a guard bit. By this scheme, the restriction in VLSI using the butterfly elements can be reduced drastically. (3) As an additional effect of the regular structure of the memory element, the automatic defecttolerant operation is made possible, using the effective k-out-of-n redundant structure and the self-testing. By this scheme, the restriction for the VLSI chip-area can be reduced when the number of sampling points is increased. (4) The one-dimensional FFT processor can be realized as a modular structure, which is a cascade connection of two kinds of VLSI, i. e., butterfly element and the memory elements.


01 Jan 1987
TL;DR: The Discrete Fourier Transform is of fundamental importance for digital signal processing in the frequency domain and many efficient algorithms exist to implement the transform, each exploiting some properties of the DFT.
Abstract: The Discrete Fourier Transform (DFT) is of fundamental importance for digital signal processing in the frequency domain. Many efficient algorithms exist to implement the transform (Cooley and Tukey, 1965, Brigham, 1974, Burrus and Parks, 1985, Duhamel, 1986), each exploiting some properties of the DFT. Two dimensional (2D) DFT's are used in 2D signal processing such as image processing, implementation of FIR filters and 2D spectral analysis, etc, (Dudgeon and Mersereau, 1984). There are at least two approaches to performing 2D DFT's. One is the row-column method that sequentially applies one dimensional FFT's to the rows and columns of the data matrix. The other is the vector radix (VR) FFT method (Rivard, 1977 and Harris, et.al., 1977) that applies a two dimensional process to the data. Although less well known, the latter is more efficient in terms of the number of multiplications and additions than its ID counterpart.


01 Jan 1987
TL;DR: This paper shows that a structure of first-order digital filter form (Curtis. 1983) is most suitable for the realization of Discrete Fourier Transforms using a software approach and the significance of extending sequence lengths to a power of primes is discussed.
Abstract: This paper shows that a structure of first-order digital filter form (Curtis. 1983, wickenden. Fernando and Constantinides, 1984) is most suitable for the realization of Discrete Fourier Transforms using a software approach. First, a structure for the software technique is presented and then the significance of extending sequence lengths to a power of primes is discussed. Besides converting the hardware algorithm into a software algorithm, the use of very short basic length DFTs (e.g. 3,5,7,..) is proposed instead of long length DFTs as in previous approaches. An in-place and in-order computation is also used for the present realisation. This approach improves both speed and accuracy. The program length of this approach is much shorter than the program length of the Winograd Fourier Transform Algorithm.

08 Apr 1987
TL;DR: The general motivation for the work comes from the need to validate the FFT algorithm when it newly implemented on a computer or when new techniques or devices are added to a computer facility to evaluate discrete Fourier transforms.
Abstract: : A method is described for validating fast Fourier transforms (FFTs) based on the use of simple input functions whose discrete Fourier transforms can be evaluated in closed form Explicit analytical results are developed for one dimensional and two dimensional discrete Fourier transforms The analytical results are easily generalized to higher dimensions The results offer a means for validating the FFT algorithm in one, two, or higher dimensional settings The general motivation for the work comes from the need to validate the FFT algorithm when it newly implemented on a computer or when new techniques or devices are added to a computer facility to evaluate discrete Fourier transforms Keywords: Computer Program Verification