scispace - formally typeset
Search or ask a question

Showing papers on "Prime-factor FFT algorithm published in 2007"


Journal ArticleDOI
TL;DR: In this brief, multi-path delay commutator structures are utilized to improve the throughput rate of radix-2 andRadix-4 FFT computation by a factor of 2 to 4.
Abstract: In this brief, multi-path delay commutator structures are utilized to improve the throughput rate of radix-2 and radix-4 FFT computation by a factor of 2 to 4. Latency can also be reduced by a factor of 2 to 3. Compared with previous radix-2 and radix-4 FFT structures, the proposed high-throughput FFT with doubled throughput rate requires similar or even less hardware cost. Although split radix FFT design is more hardware efficient, the regular structure of proposed FFT structures are attractive for high throughput FFT design.

81 citations


Journal ArticleDOI
Hyun-Yong Lee1, In-Cheol Park1
TL;DR: The proposed algorithm is to decompose a discrete Fourier transform into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables.
Abstract: This paper presents an area-efficient algorithm for the pipelined processing of fast Fourier transform (FFT). The proposed algorithm is to decompose a discrete Fourier transform (DFT) into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables. The radix in the proposed decomposition is adaptively changed according to the remaining transform length to make the transform lengths of sub-DFTs resulting from the decomposition as close as possible. An 8192-point pipelined FFT processor designed for digital video broadcasting-terrestrial (DVB-T) systems saves 33% of general multipliers and 23% of the total size of twiddle factor tables compared to a conventional pipelined FFT processor based on the radix-22 algorithm. In addition to the decomposition, several implementation techniques are proposed to reduce area, such as a simple index generator of twiddle factor and add/subtract units combined with the two's complement operation

79 citations


Journal ArticleDOI
TL;DR: The use of three typical convolutions, two convolution theorems, influence coefficients, and shape functions, as well as the influence of domain size are discussed.

71 citations


Journal ArticleDOI
TL;DR: Empirically evaluate a recently proposed Fast Approximate Discrete Fourier Transform (FADFT) algorithm, FADFT-2, for the first time and it is shown that FAD FT-2 not only generally outperforms F ADFT-1 on all but the sparsest signals, but is also significantly faster than FFTW 3.1 on large sparse signals.
Abstract: In this paper we empirically evaluate a recently proposed Fast Approximate Discrete Fourier Transform (FADFT) algorithm, FADFT-2, for the first time. FADFT-2 returns approximate Fourier representations for frequency-sparse signals and works by random sampling. Its implemen- tation is benchmarked against two competing methods. The first is the popular exact FFT imple- mentation FFTW Version 3.1. The second is an implementation of FADFT-2’s ancestor, FADFT-1. Experiments verify the theoretical runtimes of both FADFT-1 and FADFT-2. In doing so it is shown that FADFT-2 not only generally outperforms FADFT-1 on all but the sparsest signals, but is also significantly faster than FFTW 3.1 on large sparse signals. Furthermore, it is demonstrated that FADFT-2 is indistinguishable from FADFT-1 in terms of noise tolerance despite FADFT-2’s better execution time.

61 citations


Journal ArticleDOI
TL;DR: A fast algorithm that performs a discrete-time discrete-scale approximation of the continuous-time transform, with subquadratic asymptotic complexity, based on a well-known relation between the Mellin and Fourier transforms.
Abstract: A fast algorithm for the discrete-scale (and β-Mellin) transform is proposed. It performs a discrete-time discrete-scale approximation of the continuous-time transform, with subquadratic asymptotic complexity. The algorithm is based on a well-known relation between the Mellin and Fourier transforms, and it is practical and accurate. The paper gives some theoretical background on the Mellin, β-Mellin, and scale transforms. Then the algorithm is presented and analyzed in terms of computational complexity and precision. The effects of different interpolation procedures used in the algorithm are discussed.

56 citations


Journal ArticleDOI
TL;DR: This tutorial simply reviews the DFT and FFT, with a few characteristic examples.
Abstract: Frequency analysis is an important issue in the IEEE. Using a computer in a calculation means moving into a non-physical, synthetic environment. Numerically, discrete or fast Fourier transformations (DFTs or FFTs) are used to obtain the frequency content of a time signal, and these are totally different than the mathematical definition of the Fourier transform. This tutorial simply reviews the DFT and FFT, with a few characteristic examples.

44 citations


Proceedings ArticleDOI
Liu, Wang, Xi, Guo, Peng 
01 Jan 2007
TL;DR: This work uses the similarity of image features in Laplacian pyramid to act as weight to denoise image and presents an accelerating algorithm to break the bottleneck of non-local means algorithm -similarity computation of compare windows.
Abstract: In the paper, we propose a robust and fast image denoising method. The approach integrates both 'Non-Local means algorithm and Laplacian Pyramid. Given an image to be denoised, we first decompose it into Laplacian pyramid. Exploiting the redundancy property of Laplacian pyramid, we then perform non-local means on every level image of Laplacian pyramid. Essentially, we use the similarity of image features in Laplacian pyramid to act as weight to denoise image. Since the features extracted in Laplacian pyramid are localized in spatial position and scale, they are much more able to describe image, and computing the similarity between them is more reasonable and more robust. Also, based on the efficient Summed Square Image (SSI) scheme and Fast Fourier Transform (FFT), we present an accelerating algorithm to break the bottleneck of non-local means algorithm -similarity computation of compare windows. After speedup, our algorithm is fifty times faster than original non-local means algorithm. Experiments demonstrated the effectiveness of our algorithm.

44 citations


Journal ArticleDOI
TL;DR: In this paper, the polar and pseudo-polar FFT can be computed very accurately and efficiently by the well-known nonequispaced FFT, and the reconstruction of a 2D signal from its Fourier transform samples on a (pseudo)polar grid by means of the inverse nonequispecific FFT is discussed.

43 citations


Journal ArticleDOI
TL;DR: The proposed design of a new hardware efficient fast cyclic convolution algorithm for small-length DFT can save large amount of hardware cost with the same processing speed when the transform length is long and the processing speed can be flexible and balanced with the hardware cost.
Abstract: A primeN-length discrete Fourier transform (DFT) can be reformulated into a (N-1)-length complex cyclic convolution and then implemented by systolic array or distributed arithmetic. In this paper, a recently proposed hardware efficient fast cyclic convolution algorithm is combined with the symmetry properties of DFT to get a new hardware efficient fast algorithm for small-length DFT, and then WFTA is used to control the increase of the hardware cost when the transform length Nis large. Compared with previously proposed low-cost DFT and FFT algorithms with computation complexity of O(logN), the new algorithm can save 30% to 50% multipliers on average and improve the average processing speed by a factor of 2, when DFT length Nvaries from 20 to 2040. Compared with previous prime-length DFT design, the proposed design can save large amount of hardware cost with the same processing speed when the transform length is long. Furthermore, the proposed design has much more choices for different applicable DFT transform lengths and the processing speed can be flexible and balanced with the hardware cost

34 citations


Journal ArticleDOI
TL;DR: A general class of split-radix fast Fourier transform (FFT) algorithms for computing the length-2m DFT is proposed by introducing a new recursive approach coupled with an efficient method for combining the twiddle factors and it is shown that the number of arithmetic operations required is independent of s and is (2m-3)2m+1+8.
Abstract: In this paper, a general class of split-radix fast Fourier transform (FFT) algorithms for computing the length-2m DFT is proposed by introducing a new recursive approach coupled with an efficient method for combining the twiddle factors. This enables the development of higher split-radix FFT algorithms from lower split-radix FFT algorithms without any increase in the arithmetic complexity. Specifically, an arbitrary radix-2/2s FFT algorithm for any value of s, 4les sles m, is proposed and its arithmetic complexity analyzed. It is shown that the number of arithmetic operations (multiplications plus additions) required by the proposed radix-2/2s FFT algorithm is independent of s and is (2m-3)2m+1+8 regardless of whether a complex multiplication is carried out using four multiplications and two additions or three multiplications and three additions. This paper thus provides a variety of choices and ways for computing the length-2m DFT with the same arithmetic complexity.

33 citations


Patent
04 Apr 2007
TL;DR: In this paper, the authors described techniques for performing Fast Fourier Transform (FFT) using a delayless pipeline and an Inverse FFT (IFFT) using the main memory.
Abstract: Techniques for performing Fast Fourier Transforms (FFT) are described. In some aspects, calculating the Fast Fourier Transform is achieved with an apparatus having a memory (610), a Fast Fourier Transform engine (FFTe) having one or more registers (650) and a delayless pipeline (630), the FFTe configured to receive a multi-point input from the main memory (610), store the received input in at least one of the one or more registers (650), and compute either or both of a Fast Fourier Transform (FFT) and an Inverse Fast Fourier Transform (IFFT) on the input using the delayless pipeline.

Proceedings ArticleDOI
01 Nov 2007
TL;DR: In this paper, a group-harmonic weighting distribution is proposed for system-wide interharmonic evaluation in power systems, which can restore the dispersing spectral leakage energy caused by the fast Fourier transform.
Abstract: The fast Fourier transform (FFT) is still a widely-used tool for analyzing and measuring both stationary and transient signals with power system harmonics in power systems. However, the misapplications of FFT can lead to incorrect results caused by some problems such as aliasing effect, spectral leakage and picket-fence effect. A strategy of group-harmonic weighting distribution is proposed for system-wide inter-harmonic evaluation in power systems. The proposed algorithm can restore the dispersing spectral leakage energy caused by the fast Fourier transform (FFT), and calculate the power distribution proportion around the adjacent frequencies at each harmonic to determine the inter-harmonic frequency. Therefore, not only high-precision in integer harmonic measurement by the FFT can be retained, but also the inter-harmonics can be identified accurately, particularly under system frequency drift. The numerical examples are presented to verify the performance of the proposed algorithm.

Book ChapterDOI
TL;DR: The tangent FFT is presented, a straightforward in-place cache-friendly DFT algorithm having exactly the same operation counts as Van Buskirk's algorithm, and it is pinpoints how the tangentFFT saves time compared to the split-radix FFT.
Abstract: The split-radix FFT computes a size-n complex DFT, when n is a large power of 2, using just 4n lg n-6n+8 arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Recently James Van Buskirk posted software demonstrating that the split-radix FFT is not optimal. Van Buskirk's software computes a size-n complex DFT using only (34/9 + o(1))n lg n arithmetic operations on real numbers. There are now three papers attempting to explain the improvement from 4 to 34/9: Johnson and Frigo, IEEE Transactions on Signal Processing, 2007; Lundy and Van Buskirk, Computing, 2007; and this paper. This paper presents the "tangent FFT," a straightforward in-place cache-friendly DFT algorithm having exactly the same operation counts as Van Buskirk's algorithm. This paper expresses the tangent FFT as a sequence of standard polynomial operations, and pinpoints how the tangent FFT saves time compared to the split-radix FFT. This description is helpful not only for understanding and analyzing Van Buskirk's improvement but also for minimizing the memory-access costs of the FFT.

Journal ArticleDOI
TL;DR: This work gives a relatively short survey of the FFT for arbitrary finite abelian groups, cyclic or not, with complete and partially novel proofs, the main distinction being explicit induction formulas for the F FT in all cases which generalize the original FFT-algorithm.
Abstract: Fast Fourier transforms (FFTs) are fast algorithms, i.e., of low complexity, for the computation of the discrete Fourier transform (DFT) on a finite abelian group. They are among the most important algorithms in applied and engineering mathematics and in computer science, in particular for one- and multidimensional systems theory and signal processing. We give a relatively short survey of the FFT for arbitrary finite abelian groups, cyclic or not, with complete and partially novel proofs, the main distinction being explicit induction formulas for the FFT in all cases which generalize the original FFT-algorithm due to Cooley and Tukey and, much earlier, to Gaus. We believe that our approach has didactic advantages over the usual ones. We also present the application of the FFT to fast convolution algorithms, and the so-called number theoretic transforms over finite coefficient rings. We do not treat those algorithms which decrease the multiplicative complexity at the expense of many more rational linear combinations, which in this context are considered costless, nor do we discuss the DFT for nonabelian finite groups.

Journal ArticleDOI
TL;DR: A new approach to approximate the SAD metric by cosine series which can be expressed in correlation terms is used which is suitable for software implementations and has a deterministic execution time unlike the existing fast algorithms for SAD matching.
Abstract: Fast Fourier transforms (FFTs) which are O(N logN) algorithms to compute a discrete Fourier transform (DFT) of size N have been called one of the ten most important algorithms of the twentieth century. However, even though many algorithms have been developed to speed up the computation the sum of absolute difference (SAD) matching, they are exclusively designed in the spatial domain. In this paper, we propose a fast frequency algorithm to speed up the process of (SAD) matching. We use a new approach to approximate the SAD metric by cosine series which can be expressed in correlation terms. These latter can be computed using FFT algorithms. Experimental results demonstrate the effectiveness of our method when using only the first correlation terms for block and template matching in terms of accuracy and speed. The proposed algorithm is suitable for software implementations and has a deterministic execution time unlike the existing fast algorithms for SAD matching.

Patent
25 Apr 2007
TL;DR: In this article, a method of reducing noise in a speech signal using a fast Fourier transform (FFT) is proposed. But the method is not suitable for the frequency domain.
Abstract: A method of reducing noise in a speech signal involves converting the speech signal to the frequency domain using a fast fourier transform (FFT), creating a subset of selected spectral subbands, determining the appropriate gain for each subband, and interpolating the gains to match the number of FFT points. The converted speech signal is then filtered using the interpolated gains as filter coefficients, and an inverse FFT performed on the processed signal to recover the time domain output signal.

Journal ArticleDOI
TL;DR: A fast algorithm for arbitrary order of polynomial time frequency transforms to significantly reduce the computational complexity is derived based on the split-radix concept.
Abstract: The polynomial time frequency transform is one of important tools for estimating the coefficients of the polynomial-phase signals (PPSs) with the maximum likelihood method. The transform converts a one-dimensional (1-D) data sequence into a multidimensional output array from which the phase coefficients of the data sequence are estimated. A prohibitive computational load is generally needed for high-order polynomial-phase signals although the 1-D fast Fourier transform (FFT) algorithm can be used. Based on the split-radix concept, this paper derives a fast algorithm for arbitrary order of polynomial time frequency transforms to significantly reduce the computational complexity. Comparisons on the computational complexity needed by various algorithms are also made to show the merits of the proposed algorithm

Journal ArticleDOI
TL;DR: An alternative way of refining phases with the origin-free modulus sum function S is shown that, instead of applying the tangent formula in sequential mode, it is applied in parallel mode with the help of the fast Fourier transform (FFT) algorithm.
Abstract: An alternative way of refining phases with the origin-free modulus sum function S is shown that, instead of applying the tangent formula in sequential mode [Rius (1993). Acta Cryst. A49, 406-409], applies it in parallel mode with the help of the fast Fourier transform (FFT) algorithm. The test calculations performed on intensity data of small crystal structures at atomic resolution prove the convergence and hence the viability of the procedure. This new procedure called S-FFT is valid for all space groups and especially competitive for low-symmetry ones. It works well when the charge-density peaks in the crystal structure have the same sign, i.e. either positive or negative.

Proceedings ArticleDOI
01 Dec 2007
TL;DR: An efficient algorithm with using parallel and pipelining methods is proposed to implement high speed and high resolution FFT algorithm to implement the high speed FFT on FPGA.
Abstract: Using fast Fourier transform (FFT) is indispensable in most signal processing applications. Designing an appropriate algorithm for the implementation of FFT can be efficacious in digital signal processing. Sophisticated techniques such as pipelining and parallel calculations have potential impacts on VLSI implementation of FFT algorithm. Furthermore, a mathematic approach such as floating point calculation achieves higher precision. In this paper, an efficient algorithm with using parallel and pipelining methods is proposed to implement high speed and high resolution FFT algorithm. Latency reduction is an important issue to implement the high speed FFT on FPGA. The Proposed FFT algorithm shows the latency of 5131 clock pulse when N refers to 1024 points. The design has the mean squared error (MSE) of 0.0001 which is preferable to Radix 2 FFT.

Journal ArticleDOI
TL;DR: In this paper, a low complexity pipeline FFT processor for MIMO-OFDM systems with four transmitting and four receiving (4 × 4) antennas is proposed which is based on multi-channel structure which enables to support multiple data streams efficiently.
Abstract: In this paper, we propose a low complexity pipeline FFT processor for MIMO-OFDM systems with four transmitting and four receiving (4 × 4) antennas. The proposed FFT processor is based on multi-channel structure which enables to support multiple data streams efficiently. With mixed-radix algorithm, the number of non-trivial multiplications of the proposed FFT processor are decreased. Implementation results show that the proposed FFT processor reduces the required number of logic gates by 25% over the conventional 4-channel R4MDC FFT processor which has been considered to be the most area-efficient FFT processor for 4 × 4 MIMO-OFDM systems.

Journal ArticleDOI
TL;DR: In this paper, an efficient algorithm is presented to analyze the electromagnetic scattering by electrically large-scale dielectric objects, which is based on the multi-region and quasi-edge buffer (MR-QEB) iterative scheme and the conjugate gradient (CG) method combined with the fast Fourier transform (FFT).
Abstract: In this paper, an efficient algorithm is presented to analyze the electromagnetic scattering by electrically large-scale dielectric objects. The algorithm is based on the multi-region and quasi- edge buffer (MR-QEB) iterative scheme and the conjugate gradient (CG) method combined with the fast Fourier transform (FFT). This algorithm is done by dividing the computational domain into small sub-regions and then solving the problem in each sub-region with buffer area using the CG-FFT method. Considering the spurious edge effects, local quasi-edge buffer regions are used to suppress these unwanted effects and ensure the stability. With the aid of the CG-FFT method, the proposed algorithm is very efficient, and can solve very large- scale problems which cannot be solved using the conventional CG-FFT method in a personal computer. The accuracy and efficiency of the proposed algorithm are verified by comparing numerical results with analytical Mie-series solutions for dielectric spheres.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: This work rigorously derives a novel variant of the general-radix Cooley-Tukey FFT that is structured to map efficiently for any vector length v and radix and includes the new FFT into the program generator spiral to generate actual C implementations.
Abstract: SIMD (single instruction multiple data) vector instructions, such as Intel's SSE family, are available on most architectures, but are difficult to exploit for speed-up. In many cases, such as the fast Fourier transform (FFT), signal processing algorithms have to undergo major transformations to map efficiently. Using the Kronecker product formalism, we rigorously derive a novel variant of the general-radix Cooley-Tukey FFT that is structured to map efficiently for any vector length v and radix. Then, we include the new FFT into the program generator spiral to generate actual C implementations. Benchmarks on Intel's SSE show that the new algorithms perform better on practically all sizes than the best available libraries Intel's MKL and FFTW.

Journal ArticleDOI
TL;DR: The grouped scheme, which can be specially applied to compute the pruning fast Fourier transform (pruning FFT) with power-of-two partial transformation length, and using the radix-2 FFT scheme, can be implemented with properties of sharing hardware and regular structures.

Patent
11 Sep 2007
TL;DR: In this paper, a variable length fast Fourier transform (FFT) system and a method for performing the FFT system in a global navigation satellite system (GNSS) signal acquisition and tracking, which includes a memory and a number of processing elements are disclosed.
Abstract: A variable length fast Fourier transform (FFT) system and a method for performing the FFT system in a global navigation satellite system (GNSS) signal acquisition and tracking, which includes a memory and a number of processing elements are disclosed. Based on the GNSS signal tracking, the variable length FFT system performs a first FFT operation together with a first data length. Based on the GNSS signal acquisition, the variable length FFT system is divided into several FFT subsystems to simultaneously perform different operations with various data lengths different from the first data length. Thus, the variable length FFT system can enhance the hardware utility and increase throughputs.

Journal ArticleDOI
TL;DR: The novel aspects of the specific FFT method described include: a bit-wise reversal re-grouping operation of the conventional FFT is replaced by the use of lossless image rotation and scaling and the usual arithmetic operations of complex multiplication are replaced with integer addition.
Abstract: The Fourier transform is one of the most important transformations in image processing. A major component of this influence comes from the ability to implement it efficiently on a digital computer. This paper describes a new methodology to perform a fast Fourier transform (FFT). This methodology emerges from considerations of the natural physical constraints imposed by image capture devices (camera/eye). The novel aspects of the specific FFT method described include: 1) a bit-wise reversal re-grouping operation of the conventional FFT is replaced by the use of lossless image rotation and scaling and 2) the usual arithmetic operations of complex multiplication are replaced with integer addition. The significance of the FFT presented in this paper is introduced by extending a discrete and finite image algebra, named Spiral Honeycomb Image Algebra (SHIA), to a continuous version, named SHIAC

Journal Article
Sun Jing1
TL;DR: In this paper, the authors proposed a power quality analysis method based on Mallat algorithm and fast Fourier transform (FFT) to distinguish steady state disturbance from non-steady state disturbance.
Abstract: Based on Mallat algorithm and fast Fourier transform (FFT), the authors propose a power quality analysis method. In this method, the wavelet denoising is applied to sampled signals; according to the detection results of catastrophe point of signals, the high frequency coefficients of the first level and the second level obtained by Mallat decomposition algorithm are taken as the criteria to distinguish steady state disturbance from non-steady state disturbance, and then the duration of disturbance can be solved. In the light of frequency band division principle of multi-resolution analysis, by use of Mallat reconstruction algorithm the transient disturbance waveform is extracted, moreover an identification subroutine that can accurately distinguish short-term variation disturbances such as voltage sag, voltage swell and interruption is programmed. For steady state disturbance, the authors point out that FFT can be used as a tool to distinguish harmonics from flicker. The effectiveness and accuracy of the proposed method is validated by Matlab-based simulation results.

Journal ArticleDOI
TL;DR: Simulations show that the precision is quite comparable, but in the case investigated the computing performance is considerably higher for DFT than FFT, and the application to image simulation for the mission Gaia and for Extremely Large Telescopes is discussed.
Abstract: Image computation is a fundamental tool for performance assessment of astronomical instrumentation, usually implemented by Fourier transform techniques. We review the numerical implementation, evaluating a direct implementation of the discrete Fourier transform (DFT) algorithm, compared with fast Fourier transform (FFT) tools. Simulations show that the precision is quite comparable, but in the case investigated the computing performance is considerably higher for DFT than FFT. The application to image simulation for the mission Gaia and for Extremely Large Telescopes is discussed.

Patent
20 Aug 2007
TL;DR: In this paper, a plurality of communication bursts are transmitted substantially simultaneously in a time slot of a time division duplex/code division multiple access communication system, and a channel response is determined for each of the K midamble shifts using a prime factor algorithm (PFA) discrete Fourier transform (DFT) algorithm, the received combined signal and the P by P square circulant matrix.
Abstract: A plurality of communication bursts are transmitted substantially simultaneously in a time slot of a time division duplex/code division multiple access communication system. The communication system has a maximum number of K midamble shifts. Each burst has an assigned midamble. Each midamble is a shifted version of a basic midamble code having a period of P. A combined signal is received. The combined signal includes a received version of each of the communication burst's midambles. A P by P square circulant matrix is constructed including the K midamble shifts. A channel response is determined for each of the K midamble shifts using a prime factor algorithm (PFA) discrete Fourier transform (DFT) algorithm, the received combined signal and the P by P square circulant matrix. The PFA DFT algorithm has a plurality of stages. Each stage has P inputs.

Journal ArticleDOI
TL;DR: The present method can be implemented in the microprocessor and VLSI environment using a commercial FFT chip and yields energy preserving and shift invariant decimated analytic wavelet coefficients, which are free of aliasing effects.
Abstract: This letter introduces an analytic wavelet transform based on linear phase quadrature mirror filters (QMFs). The computation of the analytic signal and the reconstruction of the signal is carried by the fast Fourier transform (FFT)-based algorithm. The transform yields energy preserving and shift invariant decimated analytic wavelet coefficients, which are free of aliasing effects. The present method can be implemented in the microprocessor and VLSI environment using a commercial FFT chip

Proceedings ArticleDOI
04 Dec 2007
TL;DR: A hardware interpretation to design a highly parallel and parameterized architecture of the cyclotomic FFT based on four stages and modular structure of last stage which allows to reach a very high throughput rate which, for 256-point FFT, can get hold of 8.5 fc.
Abstract: The hardware design and implementation of cyclotomic Fast Fourier Transform (FFT) over finite fields GF(2m) is described. By reformulating the algorithm presented in [8], we introduce a hardware interpretation to design a highly parallel and parameterized architecture of the cyclotomic FFT. Based on four stages and modular structure of last stage, this architecture can operate at different throughput rates. Compared to another implemented algorithm [9] which operates at fc (the system clock frequency), the proposed architecture allows to reach a very high throughput rate which, for 256-point FFT, can get hold of 8.5 fc. An FPGA implementation of the proposed architecture is given where the critical path delay and the hardware complexity are evaluated.