scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 1994"


Journal ArticleDOI
TL;DR: The “fractional Fourier transform,” previously developed by the authors, is applied to this problem with a substantial savings in computation.
Abstract: The fast Fourier transform (FFT) is often used to compute numerical approximations to continuous Fourier and Laplace transforms. However, a straightforward application of the FFT to these problems often requires a large FFT to be performed, even though most of the input data to this FFT may be zero and only a small fraction of the output data may be of interest. In this note, the “fractional Fourier transform,” previously developed by the authors, is applied to this problem with a substantial savings in computation.

141 citations


Journal ArticleDOI
TL;DR: A simple FFT-based algorithm for spectrum estimation using a single pass through the FFT is presented and is certainly better than the single pass FFT in separating closely spaced sinusoids.
Abstract: A simple FFT-based algorithm for spectrum estimation is presented. The major difference between this and spectrum estimation using a single pass through the FFT is that the proposed algorithm is iterative and the FFT is used many times in a systematic may to search for individual spectral lines. Using simulated data, the proposed algorithm is able to detect mulitple sinusoids in additive noise. The algorithm is certainly better than the single pass FFT in separating closely spaced sinusoids. Finally the algorithm is applied to some experimental measurements to illustrate its properties. >

123 citations


Journal ArticleDOI
TL;DR: This modification of Temperton's (1991) self-sorting, in-place radix-p FFT algorithm reduces the required temporary working space from order of p/sup 2/ to p+1, providing a better match to the limited number of registers in a CPU.
Abstract: Presents a modification of Temperton's (1991) self-sorting, in-place radix-p FFT algorithm. This modification reduces the required temporary working space from order of p/sup 2/ to p+1, providing a better match to the limited number of registers in a CPU. >

50 citations


Proceedings ArticleDOI
A. Saidi1
19 Apr 1994
TL;DR: A new fast Fourier transform algorithm, decimation-in-time-frequency (DITF) FFT algorithm, which reduces the number of real multiplications and additions, and is extended to radix-R FFT as well as the multidimensional F FT algorithm using the vector-radix FFT.
Abstract: A new fast Fourier transform algorithm is presented. The decimation-in-time (DIT) and the decimation-in-frequency (DIF) FFT algorithms are combined to introduce a new FFT algorithm, decimation-in-time-frequency (DITF) FFT algorithm, which reduces the number of real multiplications and additions. The DITF FFT algorithm reduces the arithmetic complexity while using the same computational structure as the conventional Cooley-Tukey (CT) FFT algorithm. The algorithm is extended to radix-R FFT as well as the multidimensional FFT algorithm using the vector-radix FFT. >

47 citations


Proceedings ArticleDOI
14 Nov 1994
TL;DR: It is shown that the multi-dimensional formulation of the proposed FFT algorithm helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node.
Abstract: Proposes a parallel high-performance fast Fourier transform (FFT) algorithm based on a multi-dimensional formulation. We use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. We show that the multi-dimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. We implemented this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine. >

45 citations


Proceedings ArticleDOI
19 Apr 1994
TL;DR: An algorithm is developed, called the quick Fourier transform (QFT), that will reduce the number of floating point operations necessary to compute the DFT by a factor of two or four over direct methods or Goertzel's method for prime lengths.
Abstract: This paper will look at an approach that uses symmetric properties of the basis function to remove redundancies in the calculation of discrete Fourier transform (DFT). We will develop an algorithm, called the quick Fourier transform (QFT), that will reduce the number of floating point operations necessary to compute the DFT by a factor of two or four over direct methods or Goertzel's method for prime lengths. Further by applying the idea to the calculation of a DFT of length-2/sup M/, we construct a new O(N log N) algorithm. The algorithm can be easily modified to compute the DFT with only a subset of input points, and it will significantly reduce the number of operations when the data are real. The simple structure of the algorithm and the fact that it is well suited for DFTs on real data should lead to efficient implementations and to a wide range of applications. >

26 citations


Journal ArticleDOI
TL;DR: The algorithmic structures of the DIT and the DIF algorithms are shown to be equally applicable for the real-valued algorithms by systematic modifications and the computational complexity is fairly comparable with other available fast algorithms.
Abstract: The decimation-in-time (DIT) and the decimation-in-frequency (DIF) algorithms are the typical forms of the fast Fourier transform (FFT) algorithm. Many hardware and software implementations are based on these algorithms. One class of fast algorithms for computing the discrete Fourier transform (DFT) is based on a recursive factorization of the polynomial 1-z/sup N/. This paper introduces a simple recursive factorization of 1-z/sup N/ over the real numbers and a mathematical framework that generalizes the form of the DFT. Using the recursive factorization, efficient algorithms are derived to compute the DFT and the cyclic convolution of sequences of length with a power of two. Real-valued DIT and real-valued DIF algorithms are developed so that the accumulated FFT technologies can be fully utilized for real sequences. Introducing a real-valued butterfly, the algorithmic structures of the DIT and the DIF algorithms are shown to be equally applicable for the real-valued algorithms by systematic modifications. The computational complexity is fairly comparable with other available fast algorithms. >

20 citations


Proceedings ArticleDOI
01 May 1994
TL;DR: This paper presents the first single chip dedicated to the computation of direct or inverse fast Fourier transforms of up to 8192 complex points, developed mainly for the validation of the Single Frequency Network concept in an COFDM digital terrestrial television system.
Abstract: This paper presents the first single chip dedicated to the computation of direct or inverse fast Fourier transforms of up to 8192 complex points. Due to its pipelined architecture, it can perform a 8 K FFT every 400 /spl mu/s and a IK FFT every 50 /spl mu/s. A new internal results scaling scheme has been introduced in order to optimize the SNR and to minimize the storage requirements. This component has been developed mainly for the validation of the Single Frequency Network concept in an COFDM digital terrestrial television system. >

19 citations


Journal ArticleDOI
TL;DR: This paper proposes a bit-reversal algorithm that reduces the computational effort to an extent that it becomes negligible compared with the data swapping operation for which the bit- reversal is required.
Abstract: The necessity for an efficient bit-reversal routine in the implementation of fast discrete Fourier transform algorithms is well known. In this paper, we propose a bit-reversal algorithm that reduces the computational effort to an extent that it becomes negligible compared with the data swapping operation for which the bit-reversal is required. >

15 citations


Journal ArticleDOI
01 Mar 1994
TL;DR: It is shown that these basefield transforms can be viewed as "projections" of the discrete Fourier transform (DFT) and that many of the existing real Hartley algorithms are projections of well-known FFT algorithms.
Abstract: We present a general framework for constructing transforms in the field of the input which have a convolution-like property. The construction is carried out over the reals, but is shown to be valid over more general fields. We show that these basefield transforms can be viewed as "projections" of the discrete Fourier transform (DFT). Furthermore, by imposing an additional condition on the projections, one may obtain self-inverse versions of the basefield transforms. Applying the theory to the real and complex fields, we show that the projection of the complex DFT results in the discrete combinational Fourier transform (DCFT) and that the imposition of the self-inverse condition on the DCFT yields the discrete Hartley transform (DHT). Additionally, we show that the method of projection may be used to derive efficient basefield transform algorithms by projecting standard FFT algorithms from the extension field to the basefield. Using such an approach, we show that many of the existing real Hartley algorithms are projections of well-known FFT algorithms. >

14 citations


Proceedings ArticleDOI
30 May 1994
TL;DR: In this paper the sliding implementation of the other useful transforms, that can also be implemented with the order of N complexity, are worked out in detail.
Abstract: Implementation of the transform domain adaptive filters is addressed. Recent results have shown that if the input data to a radix-2 fast Fourier transform (FFT) structure is sliding one sample at a time, only N-1 butterflies need to be calculated for updating the FFT structure, after the arrival of every new data sample. This is opposed to most of the previous reports that, assume order of N log N complexity, for such implementation. In this paper the sliding implementation of the other useful transforms, that can also be implemented with the order of N complexity, are worked out in detail. >

Journal ArticleDOI
TL;DR: It is shown that the discrete short-time Fourier transform with temporal decimation (DSTFT-TD) can be evaluated using a variety of pruned FFT structures and further computational savings can be achieved by combining overlap pruning with classical frequency pruning.
Abstract: We show that the discrete short-time Fourier transform with temporal decimation (DSTFT-TD) can be evaluated using a variety of pruned FFT structures. A pruning method we refer to as overlap pruning can be used to eliminate computational overlap between consecutive FFT's for computing slices of the DSTFT-TD. When only a limited frequency range of the DSTFT-TD is of interest, further computational savings can be achieved by combining overlap pruning with classical frequency pruning. We evaluate the complexity of the overlap and frequency pruned FFT's for the DSTFT-TD in terms of the number of complex multiplications and additions required for the computation of each DSTFT-TD slice. >

Proceedings ArticleDOI
09 Nov 1994
TL;DR: Vector-Radix-algorithms which decimate and transform a 2D data set simultaneous for both index directions and therefore seem suitable for parallelization are concentrated on.
Abstract: Dynamic development in digital signal processing is inseparably bound to the disclosure of the fast Fourier transform (FFT). Implications from the application of these efficient algorithms for calculating the discrete (inverse) Fourier transform are significant in many ways. Applicability of FFT algorithms ranges far into almost every aspect of physics and performs a central role in analysis, design and implementation of DSP algorithms and digital systems. Consumed computer time almost ceases to be a problem when using FFT compared with straightforward discrete Fourier transform (DFT). The cutdown on consumed computer time by usage of FFT algorithms even holds greater promise for multidimensional applications with in general more complex tasks and heavier data loads to cope with. Without multidimensional FFT algorithms for high speed convolution or spectral analysis the successes for example in SAR, tomography, data compression or picture processing could not have been achieved. Since the introduction of the Cooley-Tukey-algorithm in 1965 methods to calculate the two- or N dimensional Fourier transform of a set of data are based essentially on the separability of the 2D FFT. With a 1D FFT algorithm the data set is `combed' row- and columnwise to form the 2D transform of the calculated 1D transforms. After some basics and recalling some different conventional approaches to 1D and 2D Fourier transform the paper concentrates on Vector-Radix-algorithms which decimate and transform a 2D data set simultaneous for both index directions and therefore seem suitable for parallelization. Vector-Radix-approaches are derived for general radices and for the 2D case also for nonquadratic data sets.© (1994) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: In this article, the problem of attenuation correction in 3D imaging by a confocal scanning laser microscope (CSLM) in the (epi)fluorescence mode was reformulated as a statistical estimation problem.
Abstract: Recently we developed a new method for attenuation correction in 3D imaging by a confocal scanning laser microscope (CSLM) in the (epi)fluorescence mode. The fundamental element in our approach consisted of multiplying the measured fluorescent intensity by a correction factor involving a convolution integral of this intensity, which can be computed efficiently by the fast Fourier transform (FFT). The resulting algorithm is one or two orders of magnitude faster than an existing iterative method, but it was found to have a somewhat smaller accuracy. In this paper we improve on this latter point by reformulating the problem as a statistical estimation problem. In particular, we derive first-order-moment and cumulant estimators leading to a nonlinear integral equation for the unknown fluorescent density, which is solved by an iterative method in which in each step a discrete convolution is performed by using the FFT. We find that only a few iterations are needed. It is shown that the estimators proposed here are more accurate than the existing iterative method, while they retain the advantage in computational efficiency of the FFT-based approach.

Proceedings ArticleDOI
31 Oct 1994
TL;DR: Fast Fourier transform (FFT) arrays with built-in error correction are proposed, and a time shared TMR scheme is used to achieve the error correcting capability.
Abstract: Fast Fourier transform (FFT) arrays with built-in error correction are proposed. A time shared TMR scheme is used to achieve the error correcting capability. A quarter of the original FFT array is triplicated and voted in each stage. Therefore the hardware complexity of the error correcting FFT array is a little more than 75% of the original FFT array. This is significant since the error correcting design is smaller than the original. The price for this hardware reduction is that the delay time increases by a factor of 4. However, the throughput penalty can be minimized by pipelining. A technology-independent gate-level analysis of hardware complexity and delay time is included. >

Journal ArticleDOI
TL;DR: This algorithm applies a 2-D matrix factorization technique in a2-D space and offers a way to do 1-D FFT in both dimensions simultaneously and can be extended toM-D cases forM>2.
Abstract: A new 2-D FFT algorithm is described. This algorithm applies a 2-D matrix factorization technique in a 2-D space and offers a way to do 2-D FFT in both dimensions simultaneously. The computation is greatly reduced compared to traditional algorithms. This will improve the realization of a 2-D FFT on any kind of computer. However its good parallelism will especially benefit an implementation on a computer with hypercube architecture. A good arrangement of parallel processors will save a great deal of running time. Furthermore this algorithm can be extended toM-D cases forM>2.

Proceedings ArticleDOI
20 Mar 1994
TL;DR: Using the properties of the serial operators, the authors propose a way to approach the first without going too far from the second and propose an architecture to implement it in VLSI.
Abstract: The Fourier transform is very used for its properties and because the fast fourier Transform (FFT) algorithm has allowed speeding up computations. Each step of it introduces an error caused first by the quantization of the sine and cosine coefficients, second by the necessarily limited size-increase of the results. The roundoff-phenomenon has much more important effects than the coefficient-imprecision. The arithmetic unit can treat numbers either sufficiently large to maximize the accuracy or smaller to minimize the area, in the case of parallel operators, or the computation time, in the case of serial operators. Using the properties of the serial operators, the authors propose a way to approach the first without going too far from the second and propose an architecture to implement it in VLSI. >

Book ChapterDOI
06 Sep 1994
TL;DR: The algorithm considered is the radix r self-sorting algorithm which does not require additional data reordering stages (digit-reversal) as this process is inherently carried out during the execution of the algorithm.
Abstract: In this work we present a study of the vectorization of the fast Fourier transform. The algorithm we have considered is the radix r self-sorting algorithm which does not require additional data reordering stages (digit-reversal) as this process is inherently carried out during the execution of the algorithm. For obtainig the vectorized version of the algorithm we employ a formulation of the FFT in terms of an operator string. Each of the operators represents an operation over the data flow of the algorithm and will have a direct implementation on the vectorial processor. The algorithm thus obtained has been implemented on the Fujitsu VP-2400/10 vector computer, resulting in reduced execution times.

Journal ArticleDOI
TL;DR: A variable order method for the fast and accurate computation of the Fourier transform is presented and the increase in accuracy is achieved by applying corrections to the trapezoidal sum approximations obtained by the FFT method.
Abstract: In this paper, a variable order method for the fast and accurate computation of the Fourier transform is presented. The increase in accuracy is achieved by applying corrections to the trapezoidal sum approximations obtained by the FFT method. It is shown that the additional computational work involved is of orderK(2m+2), wherem is a small integer andK≤n. Analytical expressions for the associated error is also given.

Journal ArticleDOI
TL;DR: This work uses visual analysis to describe the EEG contents in terms of frequencies, amplitudes, phases, and unique waveforms as they evolve over time to explore the relationship between the visual analysis of the EEG and the FFT.
Abstract: .There are several methods available to neurophysiologic technologists for the analysis of EEGs. Visual analysis is a technique that is comfortable and familiar to us. We use it to describe the EEG contents in terms of frequencies, amplitudes, phases, and unique waveforms as they evolve over time. The compressed spectral array (CSA) and the fast Fourier transform (FFT) are alternative analysis techniques that are becoming more common for monitoring in intensive care units and in the operating room. Additionally, with the advent of digital EEG, FFTs are available on some newer instruments. My own problems in trying to comprehend the relationship between the visual analysis of the EEG and the FFT has been my inability to get my hands on the FFT in a practical sense. I was comfortable with the general concept of transforms, but I really wanted to make small models for myself to explore the FFT. In the process of searching for methods of making small models, I began to read about the discrete Fourier ...

Proceedings ArticleDOI
25 Oct 1994
TL;DR: The Battle-Lemarie scaling function is used in an algorithm for fast computation of the Fourier transform of a piecewise smooth function f and an application of this algorithm to image processing is considered.
Abstract: We use the Battle-Lemarie scaling function in an algorithm for fast computation of the Fourier transform of a piecewise smooth function f. Namely, we compute for -N

Book ChapterDOI
02 Dec 1994
TL;DR: In this article, the authors propose an exact relation between the Discrete Fourier Transform and the periodic sums associated with a function and its Fourier transform in a similar way as in the Poisson summation formula.
Abstract: The algorithm is based on an exact relation, due to Cooley, Lewis and Welch, between the Discrete Fourier Transform and the periodic sums, associated with a function and its Fourier Transform in a similar way as in the Poisson summation formula. It makes use of several equidistant grids, with the same number of points covering m different symmetric intervals of length L, 2L, 4L, 8L,…, where it applies FFT and spline interpolation to the midpoints of the grid.

Proceedings ArticleDOI
06 Apr 1994
TL;DR: Algorithms for unordered Parallel Fast Fourier transform (PFFT) pairs with radices 2 and mixed radix (4-2) for distributed memory machines are presented and theoretical estimates of the computa t ion costs in each case demonstrate that the higher-radix F F T is more efficient for parailel implementation.
Abstract: Algorithms for unordered Parallel Fast Fourier transform (PFFT) pairs with radices 2 and mixed radix (4-2) for distributed memory machines are presented. Distributed memory versions using distance one and distance two communications are derived. Theoretical estimates of the computa t ion costs in each case demonstrate that the higher-radix F F T is more efficient for parailel implementation. Furthermore, distance-two communication strategies can minimize communication cost when certain architecture dependent parameters are satisfied. I n t r o d u c t i o n . In this paper, power-of-two (PO2) Fast Fourier transforms are considered for implementation on the power-of-two topology hypercube. PO2 FFTs can be classified into two groups: unordered and ordered. The unordered P 0 2 P F F T algorithms produce an output data sequence which is the bit-reverse of the input data sequence. The ordered PO2 P F F T algorithms generate identical input and output data sequences. In this paper we focus on PO2 unordered P F F T algorithms, and, in particular, unordered P F F T pairs consisting of a forward and inverse transform. It has already been demonstrated by Woo and Renaut [6] that the most efficient radix-2 P F F T algorithms minimize the number of communications, the communication distance, and the data packet size. It is known that, in scalar mode, radix-2 F F T algorithms require more computat ion than radix-4 and mixed-radix (4-2) FFT algorithms. Is this still the case in parallel mode? Furthermore, as commonly assumed, is distance two communication required? Here we show that indeed higher radix has the potential to be more efficient and further can be implemented in distance one. The definition of scalar FFT can be found in many text books and papers such as [1, 3]. Parallel implementations are designed with the aid of the sequence to processor map which is explained in the following section. Distance-1 algorithms and their time complexities are explained and compared in the later sect-ions. Conclusions follow at the e.nd. Permission to copy without fee all or part of this material is granted provided that the copies are not made or dis~buted for direct commercial advantage, the ACM copyright notice and the rifle of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machine~. To copy otherwise, or to republish, requires a fee and/or specific permission. O 1994 ACM 089791-647.-6/ 94/0003 S3.50 D e s i g n o f P O 2 P F F T A l g o r i t h m s . The distribution of the da ta points among the processors p lays a very important role in reducing the number of communicat ions for the FFT. Here an interleaved or wraparound da ta assignment is used. For this mapping, the data ids in each processor are defined b y d a t a i d = m e + P . j , j = O , . . . , 2 n d l , w h e r e P = 2 d is the number of processors and me is the processor id. This is most easily implemented via the interleaved sequence-toprocessor map as introduced by Woo and Renaut [4, 2]: z ~ = ( i a , . . . , i ,~ -1 1io . . . . . i d t ) . (1) Here the element id m = i n 1 . . . i o (binary) and the partition I has been introduced to separate the address on the left from the processor number on the right. With this notation the element zm has address i n t " " id in processor id1 "" • io. To implement the PFFT, we use index-digit permutat ions of Zrn ~ described by the r-element /-cycle and partiMexchange/-cycles, [3] and [4, 5, 7], respectively. ' D e f i n i t i o n . An r-element /-cycle is an index-digit permutation of zm in which the pivot group is exchanged with any group of r digits, either in the address or the processor number. Here, the pivot is a group of r digits. De f in i t i on . An r-element partial-exchange /-cycle is an index-digit permutation of Zm in which the pivot is exchanged only with any group of r digits in the processor number position. Here, the pivot is a group of r digits. All the /-cycles satisfy the following properties: (1) The communication distance is equal to the number of different bits in the processor id (in binary form). (2) The communication length is equal to 2 n a ° where v is the number of different bits in address id (binary form). (3) Any element in the address position can be the pivot. (4) Any exchange in address only means that the data point sequence in each processor must be rearranged and no interprocessor communication is involved. Note that the distance-one and distance-two communication strategies presented here use r = 1 and r = 2 in the above definitions. D i s t a n c e 1 P O 2 P a r a l l e l F F T Pa i r s . A l g o r i t h m s . The algorithm for the radix-2 P F F T pair which assigns data in wraparound order was first introduced by Woo [4, 2]. It uses d distance-1 communications with packets of size 2 n d 1 data points andl as such, minimizes communication

01 Sep 1994
TL;DR: The algorithm presented is intended for use in the solution of partial differential equations, or in any situation in which a large number of forward and backward transforms must be performed and in which the Fourier Coefficients need not be ordered.
Abstract: This report deals with parallel algorithms for computing discrete Fourier transforms of real sequences of length N not equal to a power of two. The method described is an extension of existing power of two transforms to sequences with N a product of small primes. In particular, this implementation requires N = 2{sup p}3{sup q}5{sup r}. The communication required is the same as for a transform of length N = 2{sup p}. The algorithm presented is intended for use in the solution of partial differential equations, or in any situation in which a large number of forward and backward transforms must be performed and in which the Fourier Coefficients need not be ordered. This implementation is a one dimensional FFT but the techniques are applicable to multidimensional transforms as well. The algorithm has been implemented on a 128 node Intel Ipsc/860.

Proceedings ArticleDOI
23 May 1994
TL;DR: The paper completes the pipelined design of the the original phase-rotation FFT, provides a fundamental new description of the algorithm directly in terms of the parallel pipeline, and describes a radix-2 implementation on the iWarp computer system that balances computation and communication to run at the full-bandwidth of the communications links, regardless of the input data set size.
Abstract: The phase-rotation FFT is a new form of the FFT that replaces data movement with multiplications by constant phasor multipliers. The result is an FFT that is simple to pipeline. The paper completes the pipelined design of the the original phase-rotation FFT, provides a fundamental new description of the algorithm directly in terms of the parallel pipeline, and describes a radix-2 implementation on the iWarp computer system that balances computation and communication to run at the full-bandwidth of the communications links, regardless of the input data set size. >

Proceedings ArticleDOI
13 Nov 1994
TL;DR: This work presents an original two-stages MD FFT algorithm where in the first stage the signal is processed by multiplier-free butterflies in such a way that at the second stage the computation only needs 1D FFT's.
Abstract: This work presents an original two-stages MD FFT algorithm where in the first stage the signal is processed by multiplier-free butterflies in such a way that at the second stage the computation only needs 1D FFT's. The proposed method is more efficient than any other MD FFT algorithm known to the authors. >

Journal ArticleDOI
TL;DR: A pipelined ring algorithm is presented for efficient computation of one and two dimensional Fast Fourier Transform (FFT) on a message passing multiprocessor and experiments reveal that the algorithm is very efficient.