scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 2016"


Journal ArticleDOI
TL;DR: A fast algorithm for computing volume potentials - that is, the convolution of a translation invariant, free-space Green's function with a compactly supported source distribution defined on a uniform grid is introduced.

84 citations


Journal ArticleDOI
TL;DR: This brief presents a new type of fast Fourier transform (FFT) hardware architectures called serial commutator (SC) FFT, based on the observation that, in the radix-2 FFT algorithm, only half of the samples at each stage must be rotated.
Abstract: This brief presents a new type of fast Fourier transform (FFT) hardware architectures called serial commutator (SC) FFT. The SC FFT is characterized by the use of circuits for bit-dimension permutation of serial data. The proposed architectures are based on the observation that, in the radix-2 FFT algorithm, only half of the samples at each stage must be rotated. This fact, together with a proper data management, makes it possible to allocate rotations only every other clock cycle. This allows for simplifying the rotator, halving the complexity with respect to conventional serial FFT architectures. Likewise, the proposed approach halves the number of adders in the butterflies with respect to previous architectures. As a result, the proposed architectures use the minimum number of adders, rotators, and memory that are necessary for a pipelined FFT of serial data, with 100% utilization ratio.

33 citations


Journal ArticleDOI
TL;DR: The triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.
Abstract: In this paper we propose a new representation for FFT algorithms called the triangular matrix representation. This representation is more general than the binary tree representation and, therefore, it introduces new FFT algorithms that were not discovered before. Furthermore, the new representation has the advantage that it is simple and easy to understand, as each FFT algorithm only consists of a triangular matrix. Besides, the new representation allows for obtaining the exact twiddle factor values in the FFT flow graph easily. This facilitates the design of FFT hardware architectures. As a result, the triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.

32 citations


Journal ArticleDOI
TL;DR: Simulation results show that compared with the conventional radix-2 shared-memory implementations, the proposed design achieves over 20% lower power consumption when computing a 1024-point complex-valued transform.
Abstract: Split-radix fast Fourier transform (SRFFT) is an ideal candidate for the implementation of a low-power FFT processor, because it has the lowest number of arithmetic operations among all the FFT algorithms. In the design of such processors, an efficient addressing scheme for FFT data as well as twiddle factors is required. The signal flow graph of SRFFT is the same as radix-2 FFT, and therefore, the conventional address generation schemes of FFT data could also be applied to SRFFT. However, SRFFT has irregular locations of twiddle factors and forbids the application of radix-2 address generation methods. This brief presents a shared-memory low-power SRFFT processor architecture. We show that SRFFT can be computed by using a modified radix-2 butterfly unit. The butterfly unit exploits the multiplier-gating technique to save dynamic power at the expense of using more hardware resources. In addition, two novel address generation algorithms for both the trivial and nontrivial twiddle factors are developed. Simulation results show that compared with the conventional radix-2 shared-memory implementations, the proposed design achieves over 20% lower power consumption when computing a 1024-point complex-valued transform.

30 citations


Journal ArticleDOI
TL;DR: In this paper, a fast Fourier transform (FFT) based method for time-varying power system harmonic measurement is proposed, where the harmonic signal is preprocessed by infinite impulse response filter bank and Teager-Kaiser energy operator for fast detection of instability onset time.
Abstract: As the rapid development in power electronic devices, the harmonic pollution becomes one of principle power quality problems in power system. The fast Fourier transform (FFT) is widely used for analysing and measuring power system harmonics. However, the limitation of the FFT such as an aliasing effect, spectrum leakage picket-fence effect, would contribute to inaccuracy results. Furthermore, the real power system harmonic is actually a non-stationary signal while FFT is a tool for stable signal. This study focuses on a novel FFT based method for time-varying power system harmonic measurement. The harmonic signal is preprocessed by infinite impulse response filter bank and Teager–Kaiser energy operator for fast detection of instability onset time. Then an adaptive Kaiser self-convolution window-based interpolated FFT algorithm is used to estimate each harmonic component. The results of both simulation and practical implementation show that the proposed method is suitable to deal with time-varying harmonic and achieves a higher accuracy compared with the traditional FFT-based techniques.

29 citations


Journal ArticleDOI
TL;DR: This brief presents a novel pipelined FFT processor for the FFT computation of two independent data streams based on the multipath delay commutator FFT architecture, which requires a lower number of registers and has high throughput.
Abstract: Nowadays, many applications require simultaneous computation of multiple independent fast Fourier transform (FFT) operations with their outputs in natural order. Therefore, this brief presents a novel pipelined FFT processor for the FFT computation of two independent data streams. The proposed architecture is based on the multipath delay commutator FFT architecture. It has an $N/2$ -point decimation in time FFT and an $N/2$ -point decimation in frequency FFT to process the odd and even samples of two data streams separately. The main feature of the architecture is that the bit reversal operation is performed by the architecture itself, so the outputs are generated in normal order without any dedicated bit reversal circuit. The bit reversal operation is performed by the shift registers in the FFT architecture by interleaving the data. Therefore, the proposed architecture requires a lower number of registers and has high throughput.

28 citations


Journal ArticleDOI
TL;DR: A novel approach to implement multiplierless unity-gain single-delay feedback fast Fourier transforms (FFTs) without compensation circuits, even when using nonunity-gain rotators, by a joint design of rotators.
Abstract: In this brief, we propose a novel approach to implement multiplierless unity-gain single-delay feedback fast Fourier transforms (FFTs). Previous methods achieve unity-gain FFTs by using either complex multipliers or nonunity-gain rotators with additional scaling compensation. Conversely, this brief proposes unity-gain FFTs without compensation circuits, even when using nonunity-gain rotators. This is achieved by a joint design of rotators, so that the entire FFT is scaled by a power of two, which is then shifted to unity. This reduces the amount of hardware resources of the FFT architecture, while having high accuracy in the calculations. The proposed approach can be applied to any FFT size, and various designs for different FFT sizes are presented.

26 citations


Journal ArticleDOI
TL;DR: Among various discrete transforms, discrete Fourier transformation (DFT) is the most important technique that performs Fourier analysis in various practical applications, such as digital signal processing, wireless communications, to name a few.
Abstract: Discrete Fourier transform (DFT) is an important transformation technique in signal processing tasks. Due to its ultrahigh computing complexity as $O(\!N^{\!2}\!)$ , $N$ - point DFT is usually implemented in the format of fast Fourier transformation (FFT) with the complexity of $O(N\log N)$ . Despite this significant reduction in complexity, the hardware cost of the multiplication-intensive $N$ - point FFT is still very prohibitive, particularly for many large-scale applications that require large $N$ . This brief, for the first time , proposes high-accuracy low-complexity scaling-free stochastic DFT/FFT designs. With the use of the stochastic computing technique, the hardware complexity of the DFT/FFT designs is significantly reduced. More importantly, this brief presents the scaling-free stochastic adder and the random number generator sharing scheme, which enable a significant reduction in accuracy loss and hardware cost. Analysis results show that the proposed stochastic DFT/FFT designs achieve much better hardware performance and accuracy performance than state-of-the-art stochastic design.

26 citations


Journal ArticleDOI
TL;DR: A configurable floating-point FFT accelerator based on CORDIC rotation is proposed, in which twiddle direction prediction is presented to reduce hardware cost and twiddle angles are generated in real time to save memory.
Abstract: Fast Fourier transform (FFT) accelerator and Coordinate rotation digital computer (CORDIC) algorithm play important roles in signal processing. We propose a configurable floating-point FFT accelerator based on CORDIC rotation, in which twiddle direction prediction is presented to reduce hardware cost and twiddle angles are generated in real time to save memory. To finish CORDIC rotation efficiently, a novel approach in which segmented-parallel iteration and compress iteration based on CSA are presented and redundant CORDIC is used to reduce the latency of each iteration. To prove the efficiency of our FFT accelerator, four FFT accelerators are prototyped into a FPGA chip to perform a batch-FFT. Experimental results show that our structure, which is composed of four butterfly units and finishes FFT with the size ranging from 64 to 8192 points, occupies 33230(3%) REGs and 143006(30%) LUTs. The clock frequency can reach 122MHz. The resources of double-precision FFT is only about 2.5 times of single-precision while the theoretical value is 4. What's more, only 13331 cycles are required to implement 8192-points double-precision FFT with four butterfly units in parallel.

18 citations


Proceedings ArticleDOI
23 May 2016
TL;DR: The challenges and proposed effective solutions to efficiently port sFFT to massively parallel processors, such as GPUs, using CUDA are explored and some of the optimization strategies such as index coalescing, loop splitting, asynchronous data layout transformation, linear time selection algorithm are presented.
Abstract: The Fast Fourier Transform (FFT) is one of the most important numerical tools widely used in many scientific and engineering applications. The algorithm performs O(nlogn) operations on n input data points in order to calculate only small number of k large coefficients, while the rest of n - k numbers are zero or negligibly small. The algorithm is clearly inefficient, when n points input data lead to only k

18 citations


Proceedings ArticleDOI
19 Aug 2016
TL;DR: This work develops a hexagonal FFT in ASA coordinates that uses only the standard Fourier transform, allowing the user to implement the hexagonally sampled FFT using standard FFT routines.
Abstract: The discrete Fourier transform is an important tool for processing digital images. Efficient algorithms for computing the Fourier transform are known as fast Fourier transforms (FFTs). One of the most common of these is the Cooley-Tukey radix-2 decimation algorithm that efficiently transforms one-dimensional data into its frequency domain representation. The orthogonality of rectangular sampling allows the separability of the Fourier kernel which enables the use of the Cooley-Tukey algorithm on two-dimensional digital images that have been sampled rectangularly. Hexagonal sampling provides many benefits over rectangular sampling, but it does not result in the orthogonal rows and columns that can be transformed independently as is done with rectangular samples. Use of the Array Set Addressing (ASA) coordinate system for hexagonally sampled images has been shown to provide a separable Fourier kernel, leading to an efficient FFT, however its implementation is composed of nonstandard transforms that require custom routines to evaluate. This work develops a hexagonal FFT in ASA coordinates that uses only the standard Fourier transform, allowing the user to implement the hexagonal FFT using standard FFT routines.

Proceedings ArticleDOI
22 May 2016
TL;DR: This brief presents the scaling-free stochastic adder and the random number generator sharing scheme, which enable a significant reduction in accuracy loss and hardware cost and achieve much better hardware performance and accuracy performance than state-of-the-art Stochastic design.
Abstract: Among various discrete transforms, discrete Fourier transformation (DFT) is the most important technique that performs Fourier analysis in various practical applications, such as digital signal processing, wireless communications, to name a few. Due to its ultra-high computing complexity as O(N2), in practice the N-point DFT is usually performed in the form of fast Fourier transformation (FFT) with complexity as O(NlogN). Despite this significant reduction in computing complexity, the hardware cost of the multiplication-intensive N-point FFT is still very prohibitive; especially for many large-scale applications that requires large N.

Proceedings ArticleDOI
18 May 2016
TL;DR: Analysis results show that compared with the conventional design, the proposed two 256-point stochastic DFT designs achieve 76% and 62% reduction in area, respectively, and show much stronger error-resilience, which is very attractive in nanoscale CMOS era.
Abstract: Discrete Fourier Transformation (DFT)/Fast Fourier Transformation (FFT) are the widely used techniques in numerous modern signal processing applications. In general, because of their inherent multiplication-intensive characteristics, the hardware implementations of DFT/FFT usually require a large amount of hardware resource, which limits their applications in area-constraint scenarios. To overcome this challenge, this paper, for the first time, proposes area-efficient error-resilient DFT designs using stochastic computing. By leveraging low-complexity stochastic multipliers, two types of stochastic DFT design are presented with significant reduction in overall area. Analysis results show that compared with the conventional design, the proposed two 256-point stochastic DFT designs achieve 76% and 62% reduction in area, respectively. More importantly, these stochastic DFT designs also show much stronger error-resilience, which is very attractive in nanoscale CMOS era.

Journal ArticleDOI
TL;DR: A stable 2D sliding fast Fourier transform (FFT) algorithm based on the vector radix 2 × 2 FFT is presented and theoretical analysis shows that the proposed algorithm has the lowest computational requirements among the existing stable sliding DFT algorithms.
Abstract: The two-dimensional (2D) discrete Fourier transform (DFT) in the sliding window scenario has been successfully used for numerous applications requiring consecutive spectrum analysis of input signals. However, the results of conventional sliding DFT algorithms are potentially unstable because of the accumulated numerical errors caused by recursive strategy. In this letter, a stable 2D sliding fast Fourier transform (FFT) algorithm based on the vector radix (VR) 2 × 2 FFT is presented. In the VR-2 × 2 FFT algorithm, each 2D DFT bin is hierarchically decomposed into four sub-DFT bins until the size of the sub-DFT bins is reduced to 2 × 2; the output DFT bins are calculated using the linear combination of the sub-DFT bins. Because the sub-DFT bins for the overlapped input signals between the previous and current window are the same, the proposed algorithm reduces the computational complexity of the VR-2 × 2 FFT algorithm by reusing previously calculated sub-DFT bins in the sliding window scenario. Moreover, because the resultant DFT bins are identical to those of the VR-2 × 2 FFT algorithm, numerical errors do not arise; therefore, unconditional stability is guaranteed. Theoretical analysis shows that the proposed algorithm has the lowest computational requirements among the existing stable sliding DFT algorithms.

Journal ArticleDOI
TL;DR: A new GPS signal acquisition method based on decomposition of FFT is proposed to improve the acquisition performance and is implemented, validated and compared with conventional serial search and radix2 FFT search algorithms using Intermediate Frequency GPS signal.

Proceedings ArticleDOI
04 Mar 2016
TL;DR: A designing scheme of high-speed real-time serial pipelined Fast Fourier Transform (FFT) processor on FPGA which is based on Coordinate Rotation Digital Computer (CORDIC) algorithm which will reduce the hardware complexity compared to the direct implementation of the butterflies using complex multipliers.
Abstract: This paper presents a designing scheme of high-speed real-time serial pipelined Fast Fourier Transform (FFT) processor on FPGA which is based on Coordinate Rotation Digital Computer (CORDIC) algorithm. The CORDIC algorithm will reduce the hardware complexity compared to the direct implementation of the butterflies using complex multipliers. Moreover, the design uses the butterflies of the radix-2 Decimation-In-Time (DIT) algorithm, the dual-port RAM and the pipelined structure, which will sufficiently increase the performances of the FFT processor. The simulation results show that compared with the same type of real-time FFT processor, the scheme presented in this paper reduces the hardware resource requirements of Adaptive Look-up Tables (ALUTs) and increase the Signal Noise Ratio (SNR) by about 25dB.

01 Jan 2016
TL;DR: This tutorial describes how to accurately measure signal power using the FFT and the different effects that introduce errors during FFT processing are described and how they can be avoided or compensated.
Abstract: This tutorial describes how to accurately measure signal power using the FFT. The different effects that introduce errors during FFT processing are described and it is explained how they can be avoided or compensated.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: This paper presents a solution of computing DFT using the dot-product engine (DPE) - a one transistor one memristor (1T1M) crossbar array with hybrid peripheral circuit support and the computing complexity is reduced to a constant O(λ) independent of the input data size.
Abstract: Discrete Fourier Transforms (DFT) are extremely useful in signal processing. Usually they are computed with the Fast Fourier Transform (FFT) method as it reduces the computing complexity from O(N2) to O(Nlog(N)). However, FFT is still not powerful enough for many real-time tasks which have stringent requirements on throughput, energy efficiency and cost, such as Internet of Things (IoT). In this paper, we present a solution of computing DFT using the dot-product engine (DPE) - a one transistor one memristor (1T1M) crossbar array with hybrid peripheral circuit support. With this solution, the computing complexity is further reduced to a constant O(λ) independent of the input data size, where λ is the timing ratio of one DPE operation comparing to one real multiplication operation in digital systems.

Proceedings ArticleDOI
01 Nov 2016
TL;DR: Reduced Fast Fourier Transformation (RFFT) is described, an algorithm of harmonic estimation based on the FFT, created by authors, convenient for voltage dips detection, and tested in Matlab / SimPowerSystems environment.
Abstract: The paper describes Reduced Fast Fourier Transformation (RFFT), an algorithm of harmonic estimation based on the FFT, created by authors, convenient for voltage dips detection. The algorithm is simple, fast, computationally inexpensive and sufficiently accurate. It is tested in Matlab / SimPowerSystems environment. Results show that the algorithm is faster and better than the FFT, which is advantage in applications for network voltage disturbances detection.

Proceedings ArticleDOI
Pankaj Gupta1
01 Mar 2016
TL;DR: A technique to estimate accurately the impact of fixed point arithmetic on FFT performance by measuring Signal-to-Quantization Noise Ratio (SQNR) of 2n (=N) Radix-2 FFT implementation and presenting the simulation results to illustrate the accuracy of the theoretical analysis.
Abstract: Fast Fourier Transform (FFT) algorithm is widely used in today's digital signal processing applications. In practice, fixed point arithmetic is used for hardware implementations. The finite bits representation of signals introduces quantization error and thereby limits its accuracy. In this paper, we present a technique to estimate accurately the impact of fixed point arithmetic on FFT performance. We evaluate the fixed point accuracy by measuring Signal-to-Quantization Noise Ratio (SQNR) of 2n (=N) Radix-2 FFT implementation. This SQNR analysis is used to determine fixed point precisions of the FFT implementation that provides a good trade-off between the required hardware resources and final FFT output signal integrity. In the end, we will present the simulation results to illustrate the accuracy of the theoretical analysis.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: The Chisel hardware construction language has been used in this work to create a generator of runtime-reconfigurable 2n3m5k FFT engines targeting software-defined radios (SDR) for modern communications, but with flexibility to support a wide range of applications.
Abstract: Runtime-reconfigurable, mixed-radix FFT/IFFT engines are essential for modern wireless communication systems. To comply with varying standards requirements, these engines are customized for each modem. The Chisel hardware construction language has been used in this work to create a generator of runtime-reconfigurable 2n3m5k FFT engines targeting software-defined radios (SDR) for modern communications, but with flexibility to support a wide range of applications. The generator uses a conflict-free, in-place, multi-bank SRAM design, and exploits the duality of decimation-in-frequency (DIF) and decimation-in-time (DIT) FFTs to support continuous data flow with only 2N memory blocks. DFT decomposition using the prime-factor algorithm (PFA) followed by the Cooley-Tukey algorithm (CTA) reduces twiddle ROM sizes. A programmable Winograd's Fourier Transform (WFTA) butterfly supporting radix-2/3/4/5/7 operations reuses radix-7 hardware to support reconfigurability with minimal area penalty. The generated FFTs use 50% less memory than iterative FFTs from Spiral. The twiddle ROM size of the generated LTE/WiFi FFT engine is 16% smaller than that of a 2048-pt Spiral design.

Proceedings ArticleDOI
27 Jul 2016
TL;DR: In order to further improve the estimation precision of sinusoidal frequency, a new estimation method based on Fast Fourier Transform (FFT) is proposed, which has low SNR threshold, and outperforms the existing estimators.
Abstract: In order to further improve the estimation precision of sinusoidal frequency, a new estimation method based on Fast Fourier Transform (FFT) is proposed. Zero-padding is used before the coarse estimation. And three sample values of Discrete-Time Fourier Transform (DTFT) of the original signal is used to perform the fine estimation. In the computer simulations, it can be shown that the proposed estimation method follows the Cramer-Rao Bound in the whole region of frequency offset. The estimation precision is higher than the existing estimators. The proposed estimator has low SNR threshold, and outperforms the existing estimators.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: Canonic Signed Digit (CSD) constant multiplier is used which minimizes the count of complex multipliers and twiddle factor memory size to achieve an optimized FFT processor in terms of area and memory requirements.
Abstract: In this paper a modified FFT (Fast Fourier Transform) processor using Mixed Radix DIT Algorithm is presented. Canonic Signed Digit (CSD) constant multiplier is used which minimizes the count of complex multipliers and twiddle factor memory size to achieve an optimized FFT processor in terms of area and memory requirements. Fixed point number representation has been used to minimize the memory consumption and I/O bandwidth. The proposed FFT processor codes are written in VHDL and synthesized using Xilinx ISE design tool of version 14.7. The used device is of Spartan-6 Family and the device targeted is XC6SLX45T. For the design verification purpose ISim simulator is used. The results have shown reduction in the hardware utilization and time delay.

Journal ArticleDOI
TL;DR: In this paper, the perturbation-based electric field integral equation of the form {R^{n-1},~n = 0, 1, 2, \ldots,} is accelerated by using fast Fourier transform (FFT) technique.
Abstract: In this communication, the computation of the perturbation-based electric field integral equation of the form ${R^{n-1},~n = 0, 1, 2, \ldots ,}$ is accelerated by using fast Fourier transform (FFT) technique. As an effective solution of the low-frequency problem, the perturbation method employs the Taylor expansion of the scalar Green’s function in free space. However, multiple impedance matrices have to be solved at different frequency orders, and the computational cost becomes extremely high, especially for large-scale problems. Since the perturbed kernels still satisfy Toeplitz property on the uniform Cartesian grid, the FFT based on Lagrange interpolation can be well incorporated to accelerate the multiple matrix vector products. Because of the nonsingularity property of high-order kernels when $n\geq 1$ , we do not need to do any near field amendment. Finally, the efficiency of the proposed method is validated in an iterative solver with numerical examples.

Proceedings ArticleDOI
20 Nov 2016
TL;DR: Theoretical computing complexity of and some other similar operation is demonstrated, revealing an advantage on computation of CS-unit, which is equivalent to a combination of a convolutional layer and a pooling layer but more effective.
Abstract: Convolution operation is the most important and time consuming step in a convolution neural network model. In this work, we analyze the computing complexity of direct convolution and fast-Fourier-transform-based (FFT-based) convolution. We creatively propose CS-unit, which is equivalent to a combination of a convolutional layer and a pooling layer but more effective. Theoretical computing complexity of and some other similar operation is demonstrated, revealing an advantage on computation of CS-unit. Also, practical experiments are also performed and the result shows that CS-unit holds a real superiority on run time. Keywords-computing complexity; FFT-based convolution; CSunit

Journal ArticleDOI
Feng Han1, Li Li1, Kun Wang1, Fan Feng1, Hongbing Pan1, Baoning Zhang1, Guoqiang He, Jun Lin1 
TL;DR: An efficient architecture for performing 128 points to 1M points Fast Fourier Transformation (FFT) based on mixed radix-2/4/8 butterfly unit is presented and an efficient 2-epoch FFT solution is realized.
Abstract: This paper presents an efficient architecture for performing 128 points to 1M points Fast Fourier Transformation (FFT) based on mixed radix-2/4/8 butterfly unit. The proposed FFT architecture reduces the computation cost by taking the advantage of the radix-8 FFT algorithm while remaining compatible with sequences whose data length is an integral power of 2. Further optimizations for reconfigurable application specified processor are developed. First, we propose a separated radix-2/4/8 butterfly unit which is more flexible than an entire radix-2/4/8 butterfly unit; second, for the sequences longer than 256K points, an efficient 2-epoch FFT solution is realized. This FFT architecture is implemented in a reconfigurable application specified processor. The computation time of our architecture is 676 us and 14.8ms for 128K and 1M points FFTs respectively.

Proceedings ArticleDOI
04 Mar 2016
TL;DR: In this work two distinct complex multiplication approaches are discussed, and the corresponding results produced by these approaches are compared with MATLAB.
Abstract: This paper presents a high speed FFT algorithms for high data rate wireless personal area network applications. In wireless personal area network the FFT/IFFT block leads the major role. Computational requirements of FFT/IFFT processes are a heavy burden in most applications. The FFT/IFFT block can be implemented in various methods. From this work, we can recognize that most of the FFT structures follow the divide and conquer algorithms, which improve the computational efficiency. In this work two distinct complex multiplication approaches are discussed, and the corresponding results produced by these approaches are compared with MATLAB. The pipelined architecture of Radix-2 SDF has been implemented with 16 point module.

Proceedings ArticleDOI
20 May 2016
TL;DR: This paper presents area optimization of parallel pipelined radix-22 feed forward Fast Fourier transform (PPFFT) architecture and is compared with other PFFT (feed forward) architectures using the same synthesis tool and FPGA to show that the proposed architecture exhibits better area optimization.
Abstract: The design of pipelined Fast Fourier transform (PFFT) in modern communication systems provides an efficient way for computation of FFT with better area utilizing hardware architecture. Previously, the radix-22 had been used only for single path delay feedback architectures. Later with many types of research works the radix 22 was extended to multi-path delay commutator (MDC) architectures. This paper presents area optimization of parallel pipelined radix-22 feed forward Fast Fourier transform (PPFFT) architecture. This architecture is provided for parallelism value 4 and 16 sample points and the area of proposed PFFT is compared with other PFFT (feed forward) architectures using the same synthesis tool and FPGA. The comparison shows that the proposed architecture exhibits better area optimization.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: The butterfly of analog Cooley-Tukey algorithm is provided, which requires less complex operations of additional and multiplication than the standard method, and runs 1.5 times faster than analogue in Matlab.
Abstract: One- and two-dimensional (2D) fast Fourier transform (FFT) algorithms has been widely used in digital processing. 2D discrete Fourier transform is reduced to a combination of one-dimensional FFT for all coordinates due to the increased complexity and the large amount of computation by increasing dimension of the signal. This article provides the butterfly of analog Cooley-Tukey algorithm, which requires less complex operations of additional and multiplication than the standard method, and runs 1.5 times faster than analogue in Matlab.

Journal ArticleDOI
TL;DR: This brief presents a hybrid structure to effectively compute the variable-length Fourier transform by employing the recursive and radix-22 fast algorithm and the proposed hardware accelerator only costs four real multipliers and ten real adders with a greater reduction than Kim et al.'s design.
Abstract: This brief presents a hybrid structure to effectively compute the variable-length Fourier transform by employing the recursive and radix-22 fast algorithm. After applying a hardware-sharing scheme to both fast algorithms, the proposed method not only improves the drawback of higher hardware cost in implementation but also retains the regular and flexible nature of recursive discrete Fourier transform (RDFT). The proposed hardware accelerator only costs four real multipliers and ten real adders with a greater reduction (86.7% and 66.7%, respectively) than Kim et al. 's design. In addition, the number of multiplications and additions for 256-point DFT computations can be reduced by 38.6% and 70%, respectively, compared to Lai et al. 's recent approach. For accuracy analysis, the SNR value of the proposed design, at least, is 4 dB higher than the other RDFT designs. Considering a whole evaluation, a very-large-scale integration chip design was further fabricated using TSMC 0.18- $\mu\mbox{m}$ 1P6M CMOS process. The core size was only 660 × 660 $\mu\mbox{m}^2$ , and the measured power consumption was 8.8 mW @ 25 MHz. The result shows that the proposed chip included data memory is 1.38 times the computational efficiency per unit area of Lai et al. 's work. Therefore, it will be the state-of-the-art RDFT processor in the application of various variable-transform-length digital signal processing issues.