scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 1991"


Journal ArticleDOI
TL;DR: In this paper, the fast Fourier transform (FFT) technique is utilized to simulate a multivariate nonstationary Gaussian random process with prescribed evolutionary spectral description, and a stochastic decomposition technique facilitates utilization of the FFT algorithm.
Abstract: The fast Fourier transform (FFT) technique is utilized to simulate a multivariate nonstationary Gaussian random process with prescribed evolutionary spectral description. A stochastic decomposition technique facilitates utilization of the FFT algorithm. The decomposed spectral matrix is expanded into a weighted summation of basic functions and time‐dependent weights that are simulated by the FFT algorithm. The desired evolutionary spectral characteristics of the multivariate unidimensional process may be prescribed in a closed form or a set of numerical values at discrete frequencies. The effectiveness of the proposed technique is demonstrated by means of three examples with different evolutionary spectral characteristics derived from past earthquake events. The closeness between the target and the corresponding estimated correlation structure suggests that the simulated time series reflect the prescribed probabilistic characteristics extremely well. The simulation approach is computationally efficient, p...

104 citations


Journal ArticleDOI
TL;DR: A structured approach is used to generate a fast algorithm to compute two-dimensional discrete cosine transforms (DCT) based on Hou's method and this algorithm is described by logic diagrams which reveal the relationship between the 2-D algorithm and its 1-D counterpart.
Abstract: A structured approach is used to generate a fast algorithm to compute two-dimensional discrete cosine transforms (DCT) based on Hou's method. Hou's algorithm is extended to the 2-D case using an approach presented in both matrix and diagrammatical forms. The matrix approach is discussed, and this forms a basis on which a 2-D fast DCT algorithm is derived. It is shown that this matrix method has a structure similar to that of the 1-D Cooley-Tukey fast Fourier transform (FFT) algorithm. Then the decimation-in-frequency (DIF) 2-D fast DCT algorithm is presented using matrix forms which use the tensor (or Kronecker) product as a construction tool. Finally, the 2-D algorithm is described by logic diagrams which reveal the relationship between the 2-D algorithm and its 1-D counterpart. As an example, the logic diagram of an 8-point*8-point 2-D DCT using the new 2-D DCT algorithm is generated through a simple procedure. >

69 citations


Journal ArticleDOI
TL;DR: It is shown how the familiar radix-2 Fast Fourier Transform algorithm can be extended toradix-3,Radix-4, radIX-5, and finally to mixed-radix FFTs, and how these new versions of the FFT require neither an unscrambling step nor work space.
Abstract: It has recently been shown that the familiar radix-2 Fast Fourier Transform (FFT) algorithm can be made both self-sorting and in-place—two useful properties which were previously thought to be mutually exclusive. In this paper the procedure is demonstrated and it is shown how it can be extended toradix-3, radix-4, radix-5, and finally to mixed-radix FFTs. These new versions of the FFT algorithm require neither an unscrambling step nor work space. Implementation on vector computers (for the case of multiple transforms) is discussed. Timing experiments on the Cray X-MP demonstrate that these new variants of the FFT run just as fast as older self sorting routines which required work space.

30 citations


Journal ArticleDOI
TL;DR: The bit-reversal counteralgorithm of B. Gold and C.M. Radar (1969) bit reverses a continuous sequence of N numbers by running a loop N -1 times and the heuristic approach presented repeats a similar loop only N/4 times.
Abstract: The bit-reversal counteralgorithm of B. Gold and C.M. Radar (1969) bit reverses a continuous sequence of N numbers by running a loop N -1 times. The heuristic approach presented repeats a similar loop only N/4 times. >

28 citations


Patent
03 Sep 1991
TL;DR: In this paper, a pipelined Fast Fourier Transform (FFT) architecture includes a memory for storing complex number data and a data path coupled to the memory for accessing R complex numbers therefrom, for computing an FFT butterfly, and storing R results from the FFT computation in the memory during one pipeline cycle.
Abstract: A pipelined Fast Fourier Transform (FFT) architecture includes a memory for storing complex number data. A pipelined data path is coupled to the memory for accessing R complex number data therefrom, for computing an FFT butterfly, and storing R results from the FFT butterfly computation in the memory during one pipeline cycle.

27 citations


Journal ArticleDOI
TL;DR: A parallel architecture especially designed for a synthetic-aperture-radar (SAR) processing algorithm based on an appropriate two-dimensional fast Fourier transform (FFT) code is presented, allowing drastic reduction of the processing time, preserving elaboration accuracy and flexibility.
Abstract: A parallel architecture especially designed for a synthetic-aperture-radar (SAR) processing algorithm based on an appropriate two-dimensional fast Fourier transform (FFT) code is presented. The algorithm is briefly summarized, and the FFT code is given for the one-dimensional case, although all results can be immediately generalized to the double FFT. The computer architecture, which consists of a toroidal net with transputers on each node, is described. Parametric expressions for the computational time of the net versus the number of nodes are derived. The architecture allows drastic reduction of the processing time, preserving elaboration accuracy and flexibility. >

27 citations


Patent
21 Oct 1991
TL;DR: In this paper, a radix-12 FFT is presented, where complex data are represented in a 1, W 3 coordinate system rather than in a classic 1,j coordinate system, and the only multiplicative scaler in the complex twiddle factors is the reciprocal of the square root of 3 which appears six times and which by conversion to canonical signed digit code, can be accurately expressed by 5 adds.
Abstract: Using classic Fast Fourier Transform (FFT) rules, a radix-12 FFT is composed of a first tier of 2 multiplierless radix-6 transformers followed by multiplierless radix-2 transformers, or by its transpose configuration. Complex data are represented in a 1, W 3 coordinate system rather than in a classic 1,j coordinate system. The only multiplicative scaler in the complex twiddle factors is the reciprocal of the square root of 3 which appears six times and which by conversion to canonical signed digit code, can be accurately expressed by 5 adds. As a consequence the complex twiddle factor multipliers and ancillary address reduce to a total of 144 real adds required to perform the entire complex 12-point FFT.

23 citations


Journal ArticleDOI
01 Sep 1991
TL;DR: The Bluestein FFT may be the algorithm of choice on multiprocessors, particularly those with the hypercube architecture because of its minimal communication requirements and for most values of N it is also shown to be superior to another alternative, namely parallel multiplication.
Abstract: The original Cooley-Tukey FFT was published in 1965 and presented for sequences with length N equal to a power of two. However, in the same paper they noted that their algorithm could be generalized to composite N in which the length of the sequence was a product of small primes. In 1967, Bergland presented an algorithm for composite N and variants of his mixed radix FFT are currently in wide use. In 1968, Bluestein presented an FFT for arbitrary N including large primes. However, for composite N, Bluestein's FFT was not competitive with Bergland's FFT. Since it is usually possible to select a composite N, Bluestein's FFT did not receive much attention. Nevertheless because of its minimal communication requirements, the Bluestein FFT may be the algorithm of choice on multiprocessors, particularly those with the hypercube architecture. In contrast to the mixed radix FFT, the communication pattern of the Bluestein FFT maps quite well onto the hypercube. With P = 2^d processors, an ordered Bluestein FFT requires 2d communication cycles with packet length N/2P which is comparable to the requirements of a power of two FFT. For fine-grain computations, the Bluestein FFT requires 20log"2N computational cycles. Although this is double that required for a mixed radix FFT, the Bluestein FFT may nevertheless be preferred because of its lower communication costs. For most values of N it is also shown to be superior to another alternative, namely parallel multiplication.

22 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: The authors introduce the pruned short-time FFT, a novel computational structure for efficiently computing the STFT with dense temporal sampling that achieves the same computational savings as the Goertzel algorithm, but is unconditionally stable.
Abstract: Although most applications which use the short-time Fourier transform (STFT) temporally downsample the output, some applications exploit a dense temporal sampling of the STFT. One example, coded-division multiple-beam sonar, is discussed. Given a need for the densely sampled STFT, the complexity of the computation can be reduced from O(N log N) for the general short-time FFT structure to O(N) using the Goertzel algorithm. The authors introduce the pruned short-time FFT, a novel computational structure for efficiently computing the STFT with dense temporal sampling. The pruned FFT achieves the same computational savings as the Goertzel algorithm, but is unconditionally stable. >

22 citations


Journal ArticleDOI
TL;DR: It is shown that the equalization of FFTs leads to results which are different from the widely used intuitive ones and the formulae of the method can be easily adapted for deriving algorithms for the cosine/sine DFT.
Abstract: A general method of deriving DFT (discrete Fourier transform) algorithms, generalised fast Fourier transform algorithms, is presented. It is shown that a special case of the method is equivalent to nesting of FFTs. The application of the method to the case where N has mutually prime factors results in a new interpretation of the permutations characteristic of this class of algorithms. It is shown that the equalization of FFTs leads to results which are different from the widely used intuitive ones. The high efficiency of split-radix FFTs is explained. It is shown that the formulae of the method can be easily adapted for deriving algorithms for the cosine/sine DFT. A set of FFTs that has smaller arithmetical and/or memory complexities than any algorithm known is presented. In particular, a method of deriving split-radix-2/sup s/ FFTs requiring N log/sub 2/ N-3N+4 real multiplications and 3N log/sub 2/ N-3N+4 additions for any s>1 is presented. >

20 citations


Journal ArticleDOI
TL;DR: A modified fast cosine transform (FCT) algorithm is presented featuring the following three properties: the entire calculation is performed using arrays half the size of what would be required using a common fast Fourier transform (FFT).

Journal ArticleDOI
TL;DR: An efficient algorithm (involving real arithmetic only) for N-point DFT is developed and used as the basic building block for developing the real valued fast Fourier transform (FFT).
Abstract: The authors earlier developed a fast recursive algorithm for the discrete sine transform (see IEEE Trans. Acoust. Speech Signal Process., vol.38, no.3, p.553-7, 1990). This algorithm is used as the basic building block for developing the real valued fast Fourier transform (FFT). It is assumed that the input sequence is real and of length N, an integer power of 2. The N-point discrete Fourier transform (DFT) of a real sequence can be implemented via the real (cos DFT) and imaginary (sin DFT) components. The N-point cos DFT in turn can be developed from N/2-point cos DFT and N/4-point discrete sine transform (DST). Similarly, the N-point sin DFT can be developed from N2-point sin DFT and N/4-point DST. Using this approach, an efficient algorithm (involving real arithmetic only) for N-point DFT is developed. >

Journal ArticleDOI
TL;DR: The prime factor algorithm was implemented on a hypercube using CrOS III communication routines, taking 120 ms to compute the DFT of 5040 complex points using 32 nodes of the Caltech-JPL MARK III Hypercube and the Cooley-Tukey algorithm with the same hardware configuration.
Abstract: The prime factor algorithm (PFA) is an efficient discrete Fourier transform (DFT) computation algorithm in which a one-dimensional DFT is tuned into a multidimensional DFT, consisting of a few short DFTs whose lengths are mutually prime, and then an efficient algorithm is used for the short DFTs. The PFA was implemented on a hypercube using CrOS III communication routines, taking 120 ms to compute the DFT of 5040 complex points using 32 nodes of the Caltech-JPL MARK III Hypercube. It took 105 ms to do a DFT of 4096 complex points using the Cooley-Tukey algorithm with the same hardware configuration. The performance of hypercubes MARK III, NCUBE, and iPSC and the relative importance of communication and calculation are analyzed. With the current communication speed the Cooley-Tukey algorithm performs fast on a massively concurrent processor and the PFA is advantageous when the number of processors is less than 64 or so. The experience with using the PFA also serves as a useful guide to a multidimensional fast Fourier transform implementation using any algorithm. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A tool to aid in the automated VLSI implementation of the discrete Fourier transform (DFT) is described and a transformation technique between a symbolic computation environment and a behavioral synthesis environment for the transferring of functional primitives is discussed.
Abstract: A tool to aid in the automated VLSI implementation of the discrete Fourier transform (DFT) is described. This tool is tensor product algebra, a branch of finite-dimensional multilinear algebra. Tensor product formulations of fast fourier transform (FFT) algorithms to compute the DFT are presented. These mathematical formulations are manipulated, using properties of tensor product algebra, to obtain variants that adapt to performance constraints in a VLSI implementation process. The possibility of automating this procedure by processing these mathematical formulations or expressions in a behavioral synthesis environment of a silicon compilation system is discussed. A transformation technique between a symbolic computation environment and a behavioral synthesis environment for the transferring of functional primitives is discussed. >

Patent
Brian R. Mercy1
13 Jun 1991
TL;DR: The balanced coefficient method as discussed by the authors reduces the number of coefficients required to process an FFT of size 2p from a total of twop coefficients to p times the square root of 2p.
Abstract: A method and apparatus for processing a digital signal by a fast Fourier transformation using balanced coefficients. The balanced coefficient method reduces the number of coefficients required to process an FFT of size 2p from a total of 2p coefficients to p times the square root of 2p. The new system employs a reduced number of coefficients in a unique addressing scheme to produce a cheaper, lighter, smaller, cooler FFT processor which uses less power and is more reliable.

Journal ArticleDOI
TL;DR: A fast algorithm is presented which computes the two-dimensional Hartley transform using the decimation in frequency decomposition and, due to its in-place property, it does not require midmemory devices or matrix transposition.
Abstract: A fast algorithm is presented which computes the two-dimensional Hartley transform. This algorithm is referred to as the split vector radix algorithm. It uses the decimation in frequency decomposition and, due to its in-place property, it does not require midmemory devices or matrix transposition. Its computational structure is simpler than that of the algorithm of L.Z. Chen (1983), and it is easy to program. Compared with the vector radix algorithm of R. Kumaresan and P.K. Gupta (1986), the proposed algorithm saves about 35% of the multiplication and 10% of the additions for the discrete Fourier transform (DFT) of a 4096*4096 real valued input sequence. >

Journal ArticleDOI
TL;DR: A matrix-based interpretation of the computation of a minimum and sufficient set of addresses, along with corresponding bit-reversed addresses, is presented, and typical software and hardware solutions are described.
Abstract: The computation of a minimum and sufficient set of addresses, along with corresponding bit-reversed addresses, is necessary to do unscrambling in conventional FFT (fast Fourier transform) procedures. A matrix-based interpretation of this problem is presented, and typical software and hardware solutions are described. An overall relationship between the two sets of primary and secondary indices is shown, obviating any additional relationship at individual pairwise level. Although the algorithms have been discussed on radix-2 basis, they can be modified for a radix if necessary. >

Journal ArticleDOI
TL;DR: In this article, the disadvantages of numerical inversion of the Laplace transform via the conventional fast Fourier transform (FFT) are identified and an improved method is presented to remedy them.
Abstract: The disadvantages of numerical inversion of the Laplace transform via the conventional fast Fourier transform (FFT) are identified and an improved method is presented to remedy them. The improved method is based on introducing a new integration step length Delta(omega) = pi/mT for trapezoidal-rule approximation of the Bromwich integral, in which a new parameter, m, is introduced for controlling the accuracy of the numerical integration. Naturally, this method leads to multiple sets of complex FFT computations. A new inversion formula is derived such that N equally spaced samples of the inverse Laplace transform function can be obtained by (m/2) + 1 sets of N-point complex FFT computations or by m sets of real fast Hartley transform (FHT) computations.

Proceedings ArticleDOI
04 Nov 1991
TL;DR: A novel filter-bank interpretation of the procedure is presented, allowing understanding of the errors occurring in the method's use for an appropriate partial-band transform, and the novel algorithm is compared to existing methods especially for the computation of a limited number of frequency points.
Abstract: A detailed analysis of a newly proposed fast Fourier transform (FFT) type algorithm is presented. Several variants are introduced in the form of signal-flow graph (SFG) descriptions. The main characteristic of the approach is the frequency-separation property of the subsequences involved in the decomposition process. A novel filter-bank interpretation of the procedure is presented, allowing understanding of the errors occurring in the method's use for an appropriate partial-band transform. These errors are studied in depth to obtain general formulas describing their nature, whatever the number and type of decomposition stages might be. The computational complexity of the algorithm is analyzed both theoretically and in terms of running-time measurements. With these insights, the novel algorithm is compared to existing methods especially for the computation of a limited number of frequency points. Previously reported complexity estimates are refined and extended. >

Journal ArticleDOI
TL;DR: A bus-oriented multiprocessor architecture specialized for computation of the discrete Fourier transform (DFT) of a length N=2/sup M/ sequential data stream is developed and allows flexibility in the number of processors and in the choice of a fast Fourier Transform (FFT) algorithm.
Abstract: A bus-oriented multiprocessor architecture specialized for computation of the discrete Fourier transform (DFT) of a length N=2/sup M/ sequential data stream is developed. The architecture distributes computation and memory requirements evenly among the processors and allows flexibility in the number of processors and in the choice of a fast Fourier transform (FFT) algorithm. With three buses, the bus bandwidth equals the input data rate. A single time-multiplexed bus with a bandwidth of three times the input data rate can alternatively be used. The architecture requires processors that have identical hardware, which makes it more attractive than the cascade (pipeline) FFT for multiprocessor implementation. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A comparative assessment of the computational complexity of several Gabor transform algorithms is given, with results in the range O(P)/sup 2/ to O (P log/sub 2/ P), where P is the number of data points being transformed.
Abstract: A comparative assessment of the computational complexity of several Gabor transform algorithms is given, with results in the range O(P)/sup 2/ to O(P log/sub 2/ P), where P is the number of data points being transformed. Among the results is a novel algorithm of lower complexity than previously known FFT (fast Fourier transform) based methods. The most efficient of the methods, which uses the Zak transform as an operational calculus, performs the Gabor analysis and synthesis transforms with a complexity comparable to that of the FFT. >

Journal ArticleDOI
TL;DR: In this paper, a new formulation of the discrete Wigner-Ville distribution is presented, which can be implemented directly using standard fast Fourier transform techniques for a non-negative frequency resolution of N points, only an N point FFT is needed.
Abstract: A new formulation of the discrete Wigner-Ville distribution is presented which can be implemented directly using standard fast Fourier transform techniques. For a non-negative frequency resolution of N points, only an N point FFT is needed.

Journal ArticleDOI
V. Nagesha1
TL;DR: Efficient fast Fourier transform algorithms to compute the forward and inverse discrete Fourier transforms of a sequence with linear-phase characteristic are examined and can be easily written by simple restructuring of a complex FFT algorithm.
Abstract: Efficient fast Fourier transform (FFT) algorithms to compute the forward and inverse discrete Fourier transforms (DFT) of a sequence with linear-phase characteristic are examined. These reduce the computational requirements as regards a complex FFT by large factors and should be used whenever applicable. The case when the DFT coefficients are real-valued leads to further reductions in computational requirements. Though the redundancy in the linear-phase situation is exactly 50%, the computational requirements and implementation are quite different from the real-valued FFT which uses a similar symmetry relation. The code for such implementations can be easily written by simple restructuring of a complex FFT algorithm. >

Journal ArticleDOI
TL;DR: The authors study the optimization of, and present several rules for, the signal sampling in a PSD, using FFT, and the results are useful in reducing the influence of stationary and periodic interferences.
Abstract: The rapid development of data acquisition and processing products provides an opportunity for the use of the fast Fourier transform (FFT) in a digital phase sensitive detector (PSD), where the measurement quality is largely dependent on the signal sampling. The error analysis on this scheme is related to the general error analysis of FFT, but there exist some essential differences between the two. The authors study the optimization of, and present several rules for, the signal sampling in a PSD, using FFT. The results are useful in reducing the influence of stationary and periodic interferences.

Journal ArticleDOI
TL;DR: A delay matrix D is derived and used along with the exponential Fourier operational matrix of integration in a new algorithm for parameter identification of LTI delayed systems, which reduces the computing time considerably and gives accurate parameter estimates.
Abstract: A delay matrix D is derived and used along with the exponential Fourier operational matrix of integration in a new algorithm for parameter identification of LTI delayed systems. The main advantage of this method over similar algorithms is that Fast Fourier Transform (FFT) can be employed for determining expansion coefficients. Therefore, it reduces the computing time considerably. A second advantage is that the Fourier delay and integration matrices are simpler than their counterparts associated with other orthogonal functions. This further reduces compulations. An example is given which shows that the algorithm gives accurate parameter estimates.

01 May 1991
TL;DR: The main features of the algorithm are that it requires no interprocessor communications, and that it maps to the degree of parallelism of the target multiprocessor flexibly.
Abstract: One of the major issues of parallel processing is the design of algorithms that minimize interprocessor communications. This research addresses this issue with respect to the parallel computation of the multidimensional discrete Fourier transform. A tensor product formulation for multidimensional Cooley-Tukey type fast Fourier transform algorithms is developed. The formulation depends on the expression of data flow by the coset decomposition of the underlying index set. This representation allows fast algorithms to be designed by algebraic manipulations of the tensor product formulation. A new reduced transform algorithm for the computation of the multidimensional discrete Fourier transform is developed. The algorithm computes a d-dimensional discrete Fourier transform by a set of independent k-dimensional discrete Fourier transforms ($k < d$); it is a reduction algorithm in the sense that it has lowered the dimension of the Fourier transforms that are computed. The k-dimensional discrete Fourier transforms are performed on data derived from the input using only additions, and produce k-dimensional hyperplanes of the output array. The major contribution of this research is an intrinsically parallel algorithm for the computation of the d-dimensional DFT. The main features of the algorithm are that it requires no interprocessor communications, and that it maps to the degree of parallelism of the target multiprocessor flexibly. The mapping of the algorithm onto architectures with broadcast and report capabilities is given. Expressions are obtained for estimating the speed on these machines as a function of the size of the d-dimensional DFT, the bandwidth C of the communications channel, the time A for an addition, the time T(FFT) for a single processing element to perform a k-dimensional DFT ($k < d$), and the degree of parallelism of the machine. For single I/O channel machines that are capable of exploiting the full degree of parallelism of the algorithm, execution times as low as the time required to compute a single k-dimensional DFT plus the I/O time for data upload and download are attainable.

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A novel algorithm for computing the discrete Fourier transform (DFT) is presented which is obtained through decomposition of the Fourier matrix representing the DFT operator into a product of sparse matrices which are not all square matrices.
Abstract: A novel algorithm for computing the discrete Fourier transform (DFT) is presented. This fast Fourier transform (FFT) algorithm is obtained through decomposition of the Fourier matrix representing the DFT operator into a product of sparse matrices which are not all square matrices. The algorithm is based on additive properties of the input and output indexing sets of the Fourier transformation. Mathematical formulations of the algorithm are presented using tensor product algebra. Properties of this algebra are used to assist in the adaptation of the algorithm to the DSP96002 microprocessor architecture. This results in efficient implementations which take into account the inherent software and hardware features of the microprocessor. >

Proceedings ArticleDOI
12 Mar 1991
TL;DR: It is shown that a FFT/NTT computed with n bits yielded equivalent performance to a 2/sup m/-point FFT which processed n+m bits.
Abstract: Radar applications of the multiple radix fast Fourier number theoretic transform (FFT/NTT) are discussed. The FFT/NTT performs a prime length discrete Fourier transform (DFT) efficiently and accurately using NTTs. The FFT/NTT produces an output without any internal truncation or rounding; therefore, it follows that use of the FFT/NTT for radar processing should yield improvements in the radar's detection performance over some FFT implementations. It is shown that a FFT/NTT computed with n bits yielded equivalent performance to a 2/sup m/-point FFT which processed n+m bits. The FFT/NTT can be easily reconfigured to process additional input bits, thus allowing for increased clutter rejection. >

Proceedings ArticleDOI
11 Jun 1991
TL;DR: A fast algorithm for computing the two-dimensional discrete cosine transform (2-D DCT) is proposed which produces a regular structure which makes it attractive for VLSI implementation.
Abstract: A fast algorithm for computing the two-dimensional discrete cosine transform (2-D DCT) is proposed. In this algorithm the 2-D DCT is converted into a form of 2-D DFT (discrete Fourier transform) which is called the odd DFT. The odd DFT can be calculated by a DFT followed by post-multiplications. The DFT part of the odd DFT is calculated by the fast discrete Radon transform. The complexity of the proposed algorithm is comparable to that of the polynomial transform approach. This new algorithm produces a regular structure which makes it attractive for VLSI implementation. Furthermore, the computation can be performed in parallel. >

Proceedings ArticleDOI
C. Lu1
11 Jun 1991
TL;DR: This work reviews the split-radix FFT algorithm for 2/sup k/ transform sizes, the multiplicative algorithms for primetransform sizes, and the prime factor algorithm for transform sizes with relatively prime factors.
Abstract: Multiply-add FFT algorithms are FFT algorithms that take advantage of computer architectures with a multiply-add feature. Various FFT algorithms can be implemented on this type of architecture to give the multiplications for free. In the present work, some of these FFT algorithms are reviewed: the split-radix FFT algorithm for 2/sup k/ transform sizes, the multiplicative algorithms for prime transform sizes, and the prime factor algorithm for transform sizes with relatively prime factors. Both complex and real data sequences are considered, and operational counts are evaluated in terms of total floating-point operations. Tensor product formulation is used throughout for producing variants of algorithms matching to computer architecture. >