scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 2009"


Journal ArticleDOI
TL;DR: A novel adaptable accurate way for calculating polar FFT and log-polar FFT is developed in this paper, named multilayer fractional Fourier transform (MLFFT), which provides a mechanism to increase the accuracy by increasing the user-defined computing level.
Abstract: A novel adaptable accurate way for calculating polar FFT and log-polar FFT is developed in this paper, named multilayer fractional Fourier transform (MLFFT). MLFFT is a necessary addition to the pseudo-polar FFT for the following reasons: It has lower interpolation errors in both polar and log-polar Fourier transforms; it reaches better accuracy with the nearly same computing complexity as the pseudo-polar FFT; it provides a mechanism to increase the accuracy by increasing the user-defined computing level. This paper demonstrates both MLFFT itself and its advantages theoretically and experimentally. By emphasizing applications of MLFFT in image registration with rotation and scaling, our experiments suggest two major advantages of MLFFT: 1) scaling up to 5 and arbitrary rotation angles, or scales up to 10 without rotation can be recovered by MLFFT while currently the result recovered by the state-of-the-art algorithms is the maximum scaling of 4; 2) No iteration is needed to obtain large rotation and scaling values of images by MLFFT, hence it is more efficient than the pseudopolar-based FFT methods for image registration.

103 citations


Proceedings ArticleDOI
24 May 2009
TL;DR: The design of a highly configurable continuous flow mixed-radix (CFMR) Fast Fourier Transform (FFT) processor is presented, and utilizes a flexible addressing scheme to enable runtime configuration of the FFT length from 16-points to 4096-points.
Abstract: The design of a highly configurable continuous flow mixed-radix (CFMR) Fast Fourier Transform (FFT) processor is presented. It computes fixed-point complex FFTs and inverse FFTs (IFFTs), and utilizes a flexible addressing scheme to enable runtime configuration of the FFT length from 16-points to 4096-points. A configurable block floating point (BFP) unit increases numerical performance. Compared to a floating point Matlab FFT function, the accuracy of the proposed architecture is 80 dB for a 64-point FFT and 74 dB for a 1024-point FFT with random complex input data.

41 citations


Journal ArticleDOI
TL;DR: The interpolating formula for the modulus-based interpolated FFT is presented with up to nine-point for a cosine window with maximum side lobe decaying and its expression is general in the window order and number of interpolating points.

39 citations


Book ChapterDOI
13 Sep 2009
TL;DR: It is shown that a two-dimensional decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes.
Abstract: In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. The proposed parallel three-dimensional FFT algorithm is based on the multicolumn FFT algorithm. We show that a two-dimensional decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. We successfully achieved a performance of over 401 GFlops on 256 nodes of Appro Xtreme-X3 (648 nodes, 147.2 GFlops/node, 95.4 TFlops peak performance) for 2563-point FFT.

37 citations


Journal ArticleDOI
TL;DR: A memory efficient approximation to the nonuniform Fourier transform of a support limited sequence is derived based on the theory of shift-invariant representations and an exact expression for the worst-case mean square approximation error is derived.
Abstract: The main focus of this paper is to derive a memory efficient approximation to the nonuniform Fourier transform of a support limited sequence. We show that the standard nonuniform fast Fourier transform (NUFFT) scheme is a shift invariant approximation of the exact Fourier transform. Based on the theory of shift-invariant representations, we derive an exact expression for the worst-case mean square approximation error. Using this metric, we evaluate the optimal scale-factors and the interpolator that provides the least approximation error. We also derive the upper-bound for the error component due to the lookup tablebased evaluation of the interpolator; we use this metric to ensure that this component is not the dominant one. Theoretical and experimental comparisons with standard NUFFT schemes clearly demonstrate the significant improvement in accuracy over conventional schemes, especially when the size of the uniform fast Fourier transform (FFT) is small. Since the memory requirement of the algorithm is dependent on the size of the uniform FFT, the proposed developments can lead to iterative signal reconstruction algorithms with significantly lower memory demands.

33 citations


Proceedings ArticleDOI
04 Oct 2009
TL;DR: Replacing the sine and cosine twiddle factors in the conventional FFT architecture by non-iterative CORDIC micro-rotations which allow substantial (~ 50%) reduction in read-only memory (ROM) table size, and total removal of complex multipliers.
Abstract: A novel technique for implementing very high speed FFTs based on unrolled CORDIC structures is proposed in this paper. There has been a lot of research in the area of FFT algorithm implementation; most of the research is focused on reduction of the computational complexity by selection and efficient decomposition of the FFT algorithm. However there has not been much research on using the CORDIC structures for FFT implementations, especially for large, high speed and high throughput FFT transforms, due to the recursive nature of the CORDIC algorithms. The key ideas in this paper are replacing the sine and cosine twiddle factors in the conventional FFT architecture by non-iterative CORDIC micro-rotations which allow substantial (~ 50%) reduction in read-only memory (ROM) table size, and total removal of complex multipliers. A new method to derive the optimal unrolling/unfolding factor for a desired FFT application based on the MSE (mean square error) is also proposed in this paper. Implemented on a Virtex-4 FPGA, the CORDIC based FFT runs 3.9 times faster and occupies 37% less area than an equivalent complex multiplier-based FFT implementation.

32 citations


Journal ArticleDOI
TL;DR: A cache-friendly version of van der Hoeven's truncated FFT and inverse truncation FFT, focusing on the case of 'large' coefficients, such as those arising in the Schonhage-Strassen algorithm for multiplication in Z[x].

30 citations


Journal ArticleDOI
01 Jul 2009
TL;DR: A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is proposed for MIMO-OFDM WLAN 802.11n.
Abstract: A pipelined Fast Fourier Transform and its inverse (FFT/IFFT) processor, which utilizes hardware resources efficiently, is proposed for MIMO-OFDM WLAN 802.11n. Compared with a conventional MIMO-OFDM implementation, (in which as many FFT/IFFT processors as the number of transmit/receive antennas is used), the proposed architecture (using hardware sharing among multiple data sequences) reduces hardware complexity without sacrificing system throughput. Further, the proposed architecture can support 1---4 input data sequences with sequence lengths of 64 or 128, as needed. The FFT/IFFT processor is synthesized using TSMC 0.18 um CMOS technology and saves 25% area compared to a conventional implementation approach using radix-23 algorithm. The proposed FFT/IFFT processor can be configured to improve power efficiency according to the number of input data sequences and the sequence length. The processor consumes 38 mW at 75 MHz for one input sequence with 64-point length; it consumes 87 mW at 75 MHz for four input sequences with length 128-point and can be efficiently used for IEEE 802.11n WLAN standard.

29 citations


Journal ArticleDOI
TL;DR: A new frequency-domain algorithm, the planar Taylor expansion through the fast Fourier transform (FFT) method, has been developed to speed the computation of the Green's function related formulas in the half-space scenario for both the near-field and the far-field.
Abstract: A new frequency-domain algorithm, the planar Taylor expansion through the fast Fourier transform (FFT) method, has been developed to speed the computation of the Green's function related formulas in the half-space scenario for both the near-field (NF) and the far-field (FF). Two types of Taylor-FFT algorithms are presented in this paper: the spatial Taylor-FFT and the spectral Taylor-FFT. The former is for the computation of the NF and the latter is for the computation of the FF or the Fourier spectrum. The planar Taylor-FFT algorithm has a computational complexity of O(N2 log2 N2) for an N times N computational grid, comparable to the multilevel fast multipole method (MLFMM). What's more important is that, the narrowband property of many electromagnetic fields allows the Taylor-FFT algorithm to use larger sampling spacing, which is limited by the transverse wave number. In addition, the algorithm is free of singularities. An accuracy of -50 for the planar Taylor-FFT algorithm is easily obtained and an accuracy of -80 dB is possible when the algorithm is optimized. The algorithm works particularly well for narrowband fields and quasi-planar geometries.

28 citations


Proceedings ArticleDOI
14 Jun 2009
TL;DR: In this paper, the authors compare three algorithms (non-equispaced DFT, interpolated FFT and non-Equispaced FFT) for OCT imaging in terms of speed and accuracy.
Abstract: In OCT imaging the spectra that are used for Fourier transformation are in general not acquired linearly in k-space. Therefore one needs to apply an algorithm to re-sample the data and finally do the Fourier Transformation to gain depth information. We compare three algorithms (Non-Equispaced DFT, interpolated FFT and Non-Equispaced FFT) for this purpose in terms of speed and accuracy. The optimal algorithm depends on the OCT device (speed, SNR) and the object.

23 citations


Proceedings ArticleDOI
07 Jun 2009
TL;DR: An efficient addressing scheme for radix-4 FFT processor that avoids the modulo-r addition in the address generation and the critical path is independent from the FFT transform length N, making it extremely efficient for large FFT transforms.
Abstract: In this study, an efficient addressing scheme for radix-4 FFT processor is presented. The proposed method uses extra registers to buffer and reorder the data inputs of the butterfly unit. It avoids the modulo-r addition in the address generation; hence, the critical path is significantly shorter than the conventional radix-4 FFT implementations. A significant property of the proposed method is that the critical path of the address generator is independent from the FFT transform length N, making it extremely efficient for large FFT transforms. For performance evaluation, the new FFT architecture has been implemented by FPGA (Altera Stratix) hardware and also synthesized by CMOS 0.18µm technology. The results confirm the speed and area advantages for large FFTs. Although only radix-4 FFT address generation is presented in the paper, it can be used for higher radix FFT.

Proceedings ArticleDOI
19 Apr 2009
TL;DR: Experimental results show that using the pruned FFT can indeed speed up the fastest available FFT implementations by up to 30% when the problem size and the pattern of unused inputs and outputs are known in advance.
Abstract: We derive a recursive general-radix pruned Cooley-Tukey fast Fourier transform (FFT) algorithm in Kronecker product notation. The algorithm is compatible with vectorization and parallelization required on state-of-the-art multicore CPUs. We include the pruned FFT algorithm into the program generation system Spiral, and automatically generate optimized implementations of the pruned FFT for the Intel Core2Duo multicore processor. Experimental results show that using the pruned FFT can indeed speed up the fastest available FFT implementations by up to 30% when the problem size and the pattern of unused inputs and outputs are known in advance.

Journal ArticleDOI
TL;DR: The expressions for the reconstructed field from the sample of the diffracted wave, which is produced by illuminating an object, are found by use of different diffraction integrals in the digital holography by the fast Fourier transform (FFT) and modified FFT-based direct integration method for the Rayleigh-Sommerfeld integral.
Abstract: The expressions for the reconstructed field from the sample of the diffracted wave, which is produced by illuminating an object, are found by use of different diffraction integrals in the digital holography. The numerical reconstruction methods that truncate and sample this field are compared in overlapping quality, accuracy, pixel resolution, computation window, and speed. The fast Fourier transform (FFT)-based direct integration method for the Fresnel integral and the modified FFT-based direct integration method for the Rayleigh-Sommerfeld integral have similar overlapping quality and can flexibly control pixel resolution and computation window size. Meanwhile, the FFT-based angular spectrum method is superior to the FFT-based convolution method in accuracy and speed. The experimental results are presented to verify these consequences.

Patent
05 Jun 2009
TL;DR: In this article, a Fast Fourier Transform (FFT) processing unit computes a current FFT vector and an accumulated previous FFT vectors correspond to sample vectors associated with preamble symbols.
Abstract: An embodiment is a method and apparatus to perform symbol synchronization. A correlation estimator computes a correlation of a sample vector representative of a narrowband signal. A synchronization detector detects symbol synchronization. Another embodiment is a method and apparatus to perform frame synchronization. A Fast Fourier Transform (FFT) processing unit computes a current FFT vector and an accumulated previous FFT vector. The current FFT vector and the accumulated previous FFT vector correspond to sample vectors associated with preamble symbols. A real and imaginary processing unit generates real and imaginary summations using the current FFT vector and the accumulated previous FFT vector. A mode processor generates mode flags representing operational modes using the real and imaginary summations.

Proceedings ArticleDOI
01 Dec 2009
TL;DR: This paper shows that the Slice Theorem is also valid within the NTT and that it can be utilized as a new exact, integer-only and fast inversion scheme for the FRT, with the same computational complexity as the FFT.
Abstract: This paper presents a new fast method to map between images and their digital projections based on the Number Theoretic Transform (NTT) and the Finite Radon Transform (FRT). The FRT is a Discrete Radon Transform (DRT) defined on the same finite geometry as the Finite or Discrete Fourier Transform (DFT). Consequently, it may be inverted directly and exactly via the Fast Fourier Transform (FFT) without any interpolation or filtering [1]. As with the FFT, the FRT can be adapted to square images of arbitrary sizes such as dyadic images, prime-adic images and arbitrary-sized images. However, its simplest form is that of prime-sized images [2]. The FRT also preserves the discrete versions of both the Fourier Slice Theorem (FST) and Convolution Property of the Radon Transform (RT). The NTT is also defined on the same geometry as the DFT and preserves the Circular Convolution Property (CCP) of the DFT [3, 4]. This paper shows that the Slice Theorem is also valid within the NTT and that it can be utilized as a new exact, integer-only and fast inversion scheme for the FRT, with the same computational complexity as the FFT. Digital convolutions and exact digital filtering of projections can also be performed using this Number Theoretic FRT (NFRT).

Journal ArticleDOI
TL;DR: In this paper, an efficient architecture for permuting data streams in-place based on properties of the symmetric group in abstract algebra is presented. But this architecture uses half the memory of a conventional double-buffering architecture with only a modest increase in addressing logic and requires only 25% more buffering than the theoretical minimum for normal-ordered frequency output.
Abstract: We have developed an efficient architecture for permuting data streams in-place based on properties of the symmetric group in abstract algebra. This architecture uses half the memory of a conventional double-buffering architecture with only a modest increase in addressing logic. The flexibility and efficiency of this permuter has enabled the development of an automatic generator of streaming Fast Fourier Transform (FFT) architectures capable of handling a configurable number of time samples in parallel. These architectures achieve 100% multiplier utilization efficiency, and require only 25% more buffering than the theoretical minimum for normal-ordered frequency output. We present parametrized generators of these permutation and FFT architectures in an open-source library targeting field programmable gate arrays.

Patent
05 Jun 2009
TL;DR: In this article, a Fast Fourier Transform (FFT) processing unit computes a current FFT vector and an accumulated previous FFT vector, corresponding to sample vectors associated with preamble symbols prior to symbol synchronization detection.
Abstract: An embodiment is a method and apparatus to perform symbol synchronization. A sign element obtains signs of samples in a sample vector. A correlation estimator computes a correlation of the sample vector. A synchronization detector detects symbol synchronization. Another embodiment is a method and apparatus to perform frame synchronization. A Fast Fourier Transform (FFT) processing unit computes a current FFT vector and an accumulated previous FFT vector. The current FFT vector and the accumulated previous FFT vector correspond to sample vectors associated with preamble symbols prior to symbol synchronization detection. A real and imaginary processing unit generates real and imaginary summations using the current FFT vector and the accumulated previous FFT vector. A mode processor generates mode flags representing operational modes using the real and imaginary summations.

Proceedings ArticleDOI
05 Jul 2009
TL;DR: A new approach for higher radix butterflies suitable for pipeline implementation is described, in which the radix-r butterfly computation concept was formulated as composite engines to implement each of the butterfly computations.
Abstract: This article describes a new approach for higher radix butterflies suitable for pipeline implementation. Based on the butterfly computation introduced by Cooley-Tukey [1], we introduce a novel approach for the factorization of the Discrete Fourier Transform (DFT), by redefining the butterfly computation, which is more suitable for efficient VLSI implementation. This proposed factorization motivated us to present a new concept of a radix-r Fast Fourier Transform (FFT), in which the radix-r butterfly computation concept was formulated as composite engines to implement each of the butterfly computations. This concept enables the radix r butterfly-processing element (BPE) to be designed by maintaining only one complex value multiplier in the butterfly critical path for any given r. Algorithmic description and performance of low complexity FFT method are considered in this paper and parallel pipelined FFT in a companion paper [15], Part II Parallel Pipelined FFT Processing.

Journal ArticleDOI
TL;DR: Two radix-2 families of fast Fourier transform algorithms that have the property that both inputs and outputs are addressed in natural order are derived in this letter.
Abstract: Two radix-2 families of fast Fourier transform (FFT) algorithms that have the property that both inputs and outputs are addressed in natural order are derived in this letter. The algorithms obtained have the same complexity that Cooley-Tukey radix-2 algorithms but avoid the bit-reversal ordering applied to the input. These algorithms can be thought as a variation of the radix-2 Cooley-Tukey ones.

Journal ArticleDOI
TL;DR: The technique of pre-calculation process for real-time FFT, which simultaneously constructs and computes the butterfly modules while the incoming data is collected, is presented and is a better choice for a critical mission requiring a shorter time to complete the FFT calculation.

Proceedings ArticleDOI
23 May 2009
TL;DR: The most common Cooley-Tukey FFT algorithm factorizes a large FFT into a combination of smaller ones, and the choice of factors and the order in which they are applied are critical to the ultimate performance of the large F FT.
Abstract: The Fast Fourier Transform (FFT) has been considered one of the most important computing algorithms for decades. Its vast application domain makes it an important performance benchmark for new computer architectures. The most common Cooley-Tukey FFT algorithm factorizes a large FFT into a combination of smaller ones. The choice of factors and the order in which they are applied are critical to the ultimate performance of the large FFT.

Journal ArticleDOI
TL;DR: Two ecient algorithms for computing the partial Fourier transforms in one and two dimensions are introduced by the wave extrapolation procedure in reection seismology to decompose the summation domain of into simpler components in a multiscale way.
Abstract: We introduce two ecient algorithms for computing the partial Fourier transforms in one and two dimensions. Our study is motivated by the wave extrapolation procedure in reection seismology. In both algorithms, the main idea is to decompose the summation domain of into simpler components in a multiscale way. Existing fast algorithms are then applied to each component to obtain optimal complexity. The algorithm in 1D is exact and takes O(N log 2 N) steps. Our solution in 2D is an approximate but accurate algorithm that takes O(N 2 log 2 N) steps. In both cases, the complexities are almost linear in terms of the degree of freedom. We provide numerical results on several test examples.

Journal ArticleDOI
TL;DR: This paper is part 3 of this series of papers, demonstrates the computation of the PSD (Power Spectral Density) and applications of the DFT and IDFT, which include filtering, windowing, pitch shifting and the spectral analysis of re-sampling.
Abstract: This paper is part 2 in a series of papers about the Discrete Fourier Transform (DFT) and the Inverse Discrete Fourier Transform (IDFT). The focus of this paper is on a fast implementation of the DFT, called the FFT (Fast Fourier Transform) and the IFFT (Inverse Fast Fourier Transform). The implementation is based on a wellknown algorithm, called the Radix 2 FFT, and requires that its’ input data be an integral power of two in length. Part 3 of this series of papers, demonstrates the computation of the PSD (Power Spectral Density) and applications of the DFT and IDFT. The applications include filtering, windowing, pitch shifting and the spectral analysis of re-sampling.

Patent
19 Aug 2009
TL;DR: An FFT/IFFT apparatus and method are provided in this article, where the storage unit has as many addresses as the number of bits of input data, and a storage unit consists of a first FFT unit, a second FFT-IFFT unit and a third FFT and IFT-IFT unit.
Abstract: An FFT/IFFT apparatus and method are provided. The FFT/IFFT apparatus includes a storage unit, a first FFT/IFFT unit, a second FFT/IFFT unit, and a third FFT/IFFT unit. The storage unit has as many addresses as the number of bits of input data. The first FFT/IFFT unit sequentially stores half of the input data in the storage unit, performs a first-point FFT/IFFT operation while sequentially receiving the other half of the input data, and stores the first-point FFT/IFFT operation result in the storage unit. The second FFT/IFFT unit performs a second-point FFT/IFFT operation on the first-point FFT/IFFTed data, and stores the second-point FFT/IFFT operation result in the storage unit. The third FFT/IFFT unit performs a third-point FFT/IFFT operation on the second-point FFT/IFFTed data, and stores the third-point FFT/IFFT operation result in the storage unit.

01 Jan 2009
TL;DR: In this paper, a harmonic estimation method based on interpolation fast Fourier transform (FFT) algorithm is proposed to reduce the influence of an unsynchronized sample sequence on power system and to improve the precision of harmonic analysis in power system.
Abstract: In order to reduce the influence of an unsynchronized sample sequence on fast Fourier transform(FFT)and to improve the precision of harmonic analysis in power system,a harmonic estimation method based on interpolation FFT algorithm is proposed.The estimation algorithm based on five coefficients of Rife-Vincent(Ⅰ)window is discussed in detail and estimating formulas of harmonic patameters are deduced in this paper.A real industrial 400/33 kV power system simulation model is established using the alternative transients program(ATP),harmonic current in system is simulated.Considering varying degrees of spectrum leakage,harmonic current is analyzed using FFT algorithm directly and Rife-Vincent(Ⅰ)window with five coefficients interpolation FFT algorithm,16 harmonics parameters are calculated and error graphs of parameters evaluation are also plotted.Finally,the simulation results of the two algorithms are analyzed and compared.The simulation results demonstrate that the estimation harmonic frequency,amplitude and phase are highly accurate using the improved algorithm for the unsynchronized sample sequence analysis.

Posted Content
TL;DR: In this article, the Cumulative Spectral Power (CSP) is proposed as a promising means to overcome some of the limitations of the Fourier transform for time varying signals with noise.
Abstract: As an old and widely used tool, it is still possible to find new insights and applications from Fast Fourier Transform (FFT)-based analyses. The FFT is frequently used to generate the Power Spectral Density (PSD) function, by squaring the spectral components that have been corrected for influence from the instrument that generated the data. Although better than a raw-data spectrum, by removing influence of the instrument transfer function, the PSD is still of limited value for time varying signals with noise, due to the very nature of the Fourier transform. The authors present here another way to treat the FFT data, namely the Cumulative Spectral Power (CSP), as a promising means to overcome some of these limitations. As will be seen from the examples provided, the CSP holds promise in a variety of different fields.

Proceedings ArticleDOI
Zhi Dong1, Yimeng Zhang1, Zhiping Huang1, Guilin Tang1, Chunwu Liu1 
28 Dec 2009
TL;DR: A new method to compute Cooley-Turkey FFT which is based on pipelined stream is presented which is less than 2/3 time comparing to previous methods and offers a better DFT method for high speed digital signal processing.
Abstract: In order to reduce the calculation time of FFT, this paper analyzes the theory of FFT, and presents a new method to compute Cooley-Turkey FFT which is based on pipelined stream. After Matlab, ISE simulation and FPGA experiment, the results show that: this method uses only 42us to compute 4096 points FFT which is less than 2/3 time comparing to previous methods. It offers a better DFT method for high speed digital signal processing.

01 Jul 2009
TL;DR: In this article, the authors proposed an efficient variable-length radix-8/4/2 FFT architecture for OFDM systems, which uses efficient "in-place" memory access method to maintain conflict-free data access and minimize memory size.
Abstract: In this paper, we propose an efficient variable-length radix-8/4/2 FFT architecture for OFDM systems. The proposed FFT processor is based on radix-8 FFT algorithm. For the limitation of FFT length, if it cannot run radix-8 FFT algorithm at the last stage then it computes radix-4 or radix-2 FFT algorithm. Furthermore, proposed FFT architecture use shared-memory to minimize and simplify hardware. We use efficient "In-place" memory access method to maintain conflict-free data access and minimize memory size. The proposed FFT architecture can be applied to variable FFT lengths including 64, 128, 256, 512, 1024, 2048, 4096 and 8192 points which cover all the required FFT lengths used in 802.11a, 802.16a, DAB, DVB-T, VDSL and ADSL.

Journal Article
TL;DR: Aiming at the spectrum leakage problem when analyzing power system harmonic adopting Fast Fourier Transform, the validity of decreasing leakage by using add window algorithm was put forward and showed that the FFT algorithm based on hanning window could get ideal effect in detecting network harmonic wave.
Abstract: Aiming at the spectrum leakage problem when analyzing power system harmonic adopting Fast Fourier Transform(FFT),the validity of decreasing leakage by using add window algorithm was put forward.In the study,the FFT algorithm based on hanning window in Matlab environment enhanced the harmonic frequency and accuracy.The simulation example showed that: the FFT algorithm based on hanning window could get ideal effect in detecting network harmonic wave.

Proceedings ArticleDOI
28 Sep 2009
TL;DR: The proposed 2D FFT for 8×8 matrix without transpose of data by using multiple topology on 4×4 Torus is proposed and it can be seen that frame per second is improved 8 times.
Abstract: In this paper, we proposed 2D FFT for 8×8 matrix without transpose of data by using multiple topology on 4×4 Torus. The proposed 2D FFT used parallel operation on 1D FFT and applied an effective calculation by executing a pipeline operation. We implement the proposed architecture on Xilinx Virtex-IV device and a detailed evaluation has been reported based on maximum system frequency, chip area and image size. The implementation results show that the core speed of the proposed FFT architecture is around 157.3MHz and it occupies 11733 slices. The average SQNR for various images is 61.9dB. To compare the proposed 2D FFT with other methods, we can see that frame per second is improved 8 times.