scispace - formally typeset
Search or ask a question

Showing papers on "Prime-factor FFT algorithm published in 2013"


Proceedings ArticleDOI
01 Oct 2013
TL;DR: A new algorithm to estimate a signal from its short-time Fourier transform modulus (STFTM) shows not only significant improvement in speed of convergence but it does as well recover the signals with a smaller error than the traditional GLA.
Abstract: In this paper, we present a new algorithm to estimate a signal from its short-time Fourier transform modulus (STFTM). This algorithm is computationally simple and is obtained by an acceleration of the well-known Griffin-Lim algorithm (GLA). Before deriving the algorithm, we will give a new interpretation of the GLA and formulate the phase recovery problem in an optimization form. We then present some experimental results where the new algorithm is tested on various signals. It shows not only significant improvement in speed of convergence but it does as well recover the signals with a smaller error than the traditional GLA.

128 citations


Journal ArticleDOI
TL;DR: A new deterministic algorithm for the sparse Fourier transform problem, in which the algorithm seeks to identify k ≪ N significant Fourier coefficients from a signal of bandwidth N, which is orders of magnitude faster than competing algorithms.
Abstract: We present a new deterministic algorithm for the sparse Fourier transform problem, in which we seek to identify k ≪ N significant Fourier coefficients from a signal of bandwidth N. Previous deterministic algorithms exhibit quadratic runtime scaling, while our algorithm scales linearly with k in the average case. Underlying our algorithm are a few simple observations relating the Fourier coefficients of time-shifted samples to unshifted samples of the input function. This allows us to detect when aliasing between two or more frequencies has occurred, as well as to determine the value of unaliased frequencies. We show that empirically our algorithm is orders of magnitude faster than competing algorithms.

69 citations


Journal ArticleDOI
TL;DR: A novel multiple-image encryption algorithm by combining log-polar transform with double random phase encoding in the fractional Fourier domain to obtain high encryption efficiency and avoids cross-talk in the meantime.
Abstract: We present a novel multiple-image encryption algorithm by combining log-polar transform with double random phase encoding in the fractional Fourier domain. In this algorithm, the original images are transformed to annular domains by inverse log-polar transform and then the annular domains are merged into one image. The composite image is encrypted by the classical double random phase encoding method. The proposed multiple-image encryption algorithm takes advantage of the data compression characteristic of log-polar transform to obtain high encryption efficiency and avoids cross-talk in the meantime. Optical implementation of the proposed algorithm is demonstrated and numerical simulation results verify the feasibility and the validity of the proposed algorithm.

66 citations


Journal ArticleDOI
Taesang Cho1, Hanho Lee1
TL;DR: A novel modified radix-25 FFT algorithm that reduces the hardware complexity is proposed, which can reduce the number of complex multiplications and the size of the twiddle factor memory.
Abstract: This paper presents a high-speed low-complexity modified radix-25 512-point fast Fourier transform (FFT) processor using an eight data-path pipelined approach for high rate wireless personal area network applications. A novel modified radix-25 FFT algorithm that reduces the hardware complexity is proposed. This method can reduce the number of complex multiplications and the size of the twiddle factor memory. It also uses a complex constant multiplier instead of a complex Booth multiplier. The proposed FFT processor achieves a signal-to-quantization noise ratio of 35 dB at 12 bit internal word length. The proposed processor has been designed and implemented using 90-nm CMOS technology with a supply voltage of 1.2 V. The results demonstrate that the total gate count of the proposed FFT processor is 290 K. Furthermore, the highest throughput rate is up to 2.5 GS/s at 310 MHz while requiring much less hardware complexity.

63 citations


Proceedings ArticleDOI
07 Jul 2013
TL;DR: The FFAST algorithm as mentioned in this paper is based on filterless subsampling of the input signal x using a small set of carefully chosen uniform sub-sampling patterns guided by the Chinese Remainder Theorem.
Abstract: Given an n-length input signal x, it is well known that its Discrete Fourier Transform (DFT), X, can be computed in O(nlogn) complexity using a Fast Fourier Transform. If the spectrum X is exactly k-sparse (where k <;<; n), can we do better? We show that asymptotically in k and n, when k is sub-linear in n (i.e., k ∝ nδ where 0 <; δ <; 1), and the support of the non-zero DFT coefficients is uniformly random, we can exploit this sparsity in two fundamental ways (i) sample complexity: we need only M = rk deterministically chosen samples of the input signal x (where r <; 4 when 0 <; δ <; 0.99); and (ii) computational complexity: we can reliably compute the DFT X using O(k log k) operations, where the constants in the big Oh are small. Our algorithm succeeds with high probability, with the probability of failure vanishing to zero asymptotically in the number of samples acquired, M. Our approach is based on filterless subsampling of the input signal x using a small set of carefully chosen uniform subsampling patterns guided by the Chinese Remainder Theorem (CRT). Specifically, our subsampling operation on x is designed to create aliasing patterns on the spectrum X that "look like" parity-check constraints of good erasure-correcting sparse-graph codes. We show how computing the sparse DFT X is equivalent to decoding of these sparse-graph codes and is low in both sample complexity and decoding complexity. We accordingly dub our algorithm the FFAST (Fast Fourier Aliasing-based Sparse Transform) algorithm. In our analysis, we rigorously connect our CRT based graph constructions to random sparse-graph codes based on a balls-and-bins model and analyze the convergence behavior of the latter using well-studied density evolution techniques from coding theory. We provide simulation results in Section IV that corroborate our theoretical findings, and validate the empirical performance of the FFAST algorithm.

58 citations


Journal ArticleDOI
TL;DR: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph.
Abstract: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals. The proposed computation is based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph. A new processing element (PE) is proposed using two radix-2 butterflies that can process four inputs in parallel. A novel conflict-free memory-addressing scheme is proposed to ensure the continuous operation of the FFT processor. Furthermore, the addressing scheme is extended to support multiple parallel PEs. The proposed real-FFT processor simultaneously requires fewer computation cycles and lower hardware cost compared to prior work. For example, the proposed design with two PEs reduces the computation cycles by a factor of 2 for a 256-point real fast Fourier transform (RFFT) compared to a prior work while maintaining a lower hardware complexity. The number of computation cycles is reduced proportionately with the increase in the number of PEs.

56 citations


Journal ArticleDOI
TL;DR: Novel parallel pipelined architectures for the computation of the fast Fourier transform (FFT) of real signals and inverse FFT of Hermitian-symmetric signals using only real datapaths are presented.
Abstract: This brief presents novel parallel pipelined architectures for the computation of the fast Fourier transform (FFT) of real signals and inverse FFT of Hermitian-symmetric signals using only real datapaths. The real FFT structure is transformed by transferring twiddle factors to subsequent stages, such that each stage in the proposed flow graph contains one column of butterfly units and one column of twiddle factor blocks, and each column of the flow graph contains only N samples. This is a key requirement for the design of architectures that are based on only real datapaths. This structure is then mapped to pipelined architectures. The proposed architectures can be used with any FFT size or level of parallelism, which is a power of two. A systematic method to design architectures for FFTs with different levels of parallelism and radix values is presented. By modifying the FFT flow graph for real-valued samples, this methodology leads to architectures with fewer adders, delays, and interconnections.

50 citations


Proceedings ArticleDOI
04 Sep 2013
TL;DR: Estimates show that the performance of the novel architecture designed to realize a million-bit multiplication architecture matches that of previously reported software implementations on a high-end 3 Ghz Intel Xeon processor, while requiring only a tiny fraction of the area.
Abstract: In this work we present the first full and complete evaluation of a very large multiplication scheme in custom hardware. We designed a novel architecture to realize a million-bit multiplication architecture based on the Schonhage-Strassen Algorithm and the Number Theoretical Transform (NTT). The construction makes use of an innovative cache architecture along with processing elements customized to match the computation and access patterns of the FFT-based recursive multiplication algorithm. When synthesized using a 90nm TSMC library operating at a frequency of 666 MHz, our architecture is able to compute the product of integers in excess of a million bits in 7.74 milliseconds. Estimates show that the performance of our design matches that of previously reported software implementations on a high-end 3 Ghz Intel Xeon processor, while requiring only a tiny fraction of the area.

41 citations


Journal ArticleDOI
TL;DR: In this article, a fast butterfly algorithm for the hyperbolic Radon transform is proposed, which reformulates the transform as an oscillatory integral operator and constructs a blockwise low-rank approximation of the kernel function.
Abstract: Generalized Radon transforms, such as the hyperbolic Radon transform, cannot be implemented as efficiently in the frequency domain as convolutions, thus limiting their use in seismic data processing. We have devised a fast butterfly algorithm for the hyperbolic Radon transform. The basic idea is to reformulate the transform as an oscillatory integral operator and to construct a blockwise low-rank approximation of the kernel function. The overall structure follows the Fourier integral operator butterfly algorithm. For 2D data, the algorithm runs in complexity O(N2 log N), where N depends on the maximum frequency and offset in the data set and the range of parameters (intercept time and slowness) in the model space. From a series of studies, we found that this algorithm can be significantly more efficient than the conventional time-domain integration.

37 citations


Journal ArticleDOI
TL;DR: A unified hardware architecture that can be reconfigured to calculate 2, 3, 4, 5, or 7-point DFTs is presented and the processing element finds potential use in memory-based FFTs, where non-power-of-two sizes are required such as in DMB-T.
Abstract: A unified hardware architecture that can be reconfigured to calculate 2, 3, 4, 5, or 7-point DFTs is presented. The architecture is based on the Winograd Fourier transform algorithm and the complexity is equal to a 7-point DFT in terms of adders/subtractors and multipliers plus only seven multiplexers introduced to enable reconfigurability. The processing element finds potential use in memory-based FFTs, where non-power-of-two sizes are required such as in DMB-T.

27 citations


Proceedings ArticleDOI
26 May 2013
TL;DR: Complexity analysis and experimental results show that this method outperforms FFT and sFFT and a top-down iterative strategy combined with different downsampling factors further saves computational costs.
Abstract: Sparse Fast Fourier Transform (sFFT) [1][2], has been recently proposed to outperform FFT in reducing computational complexity. Assume that an input signal of length N in the frequency domain is K-sparse, where K ≤ N. sFFT costs O(K logN) instead of O(N logN) in FFT. In this paper, a new fast sFFT algorithm is proposed and costs O(K logK) averagely without any operations being related to N. The idea is to downsample the original input signal at the beginning. Subsequent processing operates under downsampled signals, which length is proportional to O(K). However, downsampling possibly leads to “aliasing.” By shift theorem of DFT, the aliasing problem can be formulated as the “Moment-preserving problem.” In addition, a top-down iterative strategy combined with different downsampling factors further saves computational costs. Complexity analysis and experimental results show that our method outperforms FFT and sFFT.

Journal ArticleDOI
Jia-Ye Xie1, Hou-Xing Zhou1, Wei Hong1, Wei-Dong Li1, Guang Hua1 
TL;DR: In this paper, a novel realization of the Integral Equation in combination with the fast Fourier transform for the CFIE is established by fitting both the Green's function and its gradient onto the nodes of a uniform Cartesian grid.
Abstract: In this paper, a novel realization of the Integral Equation in combination with the fast Fourier transform for the CFIE is established by Fitting both the Green's function and its Gradient onto the nodes of a uniform Cartesian grid. The new method has been compared with several existing popular FFT-based methods, including the AIM, the IE-FFT, and the p-FFT. The accuracy of the proposed method is significantly higher than other FFT-based methods, and the method is not sensitive to both the grid spacing and the expansion order. The outstanding merit of the proposed method is that the fitting procedure is independent of the basis functions. Therefore, when the higher order basis functions would be adopted in the method of moments, only one fitting procedure for the Green's function and its gradient on a basis function support is needed to meet all of basis functions defined on this support. Some numerical examples are provided in this paper to demonstrate the accuracy and efficiency of the proposed method.

Proceedings ArticleDOI
24 Oct 2013
TL;DR: A parameterized FFT architecture is proposed to identify the design trade-offs in achieving energy efficiency, and designs achieve up to 28% and 38% improvement in the energy efficiency and EAT, respectively, compared with a state-of-the-art design.
Abstract: In this paper, we revisit the classic Fast Fourier Transform (FFT) for energy efficient designs on FPGAs. A parameterized FFT architecture is proposed to identify the design trade-offs in achieving energy efficiency. We first perform design space exploration by varying the algorithm mapping parameters, such as the degree of vertical and horizontal parallelism, that characterize decomposition based FFT algorithms. Then we explore an energy efficient design by empirical selection on the values of the chosen architecture parameters, including the type of memory elements, the type of interconnection network and the number of pipeline stages. The trade offs between energy, area, and time are analyzed using two performance metrics: the energy efficiency (defined as the number of operations per Joule) and the Energy×Area×Time (EAT) composite metric. From the experimental results, a design space is generated to demonstrate the effect of these parameters on the various performance metrics. For N-point FFT (16 ≤ N ≤ 1024), our designs achieve up to 28% and 38% improvement in the energy efficiency and EAT, respectively, compared with a state-of-the-art design.

Journal ArticleDOI
Weihua Zheng1, Kenli Li1
TL;DR: Novel order permutation of sub-DFTs and reduction of the number of arithmetic operations enhance the practicability of the proposed algorithm and inherently provides a wider choice of accessible FFT's lengths.
Abstract: Discrete Fourier transform (DFT) is widespread used in many fields of science and engineering. DFT is implemented with efficient algorithms categorized as fast Fourier transform. A fast algorithm is proposed for computing a length-N=6m DFT. The proposed algorithm is a blend of radix-3 and radix-6 FFT. It is a variant of split radix and can be flexibly implemented a length 2r×3m DFT. Novel order permutation of sub-DFTs and reduction of the number of arithmetic operations enhance the practicability of the proposed algorithm. It inherently provides a wider choice of accessible FFT's lengths.

Book ChapterDOI
01 Jan 2013
TL;DR: In this paper, the authors investigated discrete Fourier transform (DFT) and Fast Fourier Transform (FFT) algorithms to compute signal amplitude spectrum and power spectrum, and used the window function to reduce spectral leakage.
Abstract: This chapter investigates discrete Fourier transform (DFT) and fast Fourier transform (FFT) and their properties; introduces the DFT/FFT algorithms to compute signal amplitude spectrum and power spectrum; and uses the window function to reduce spectral leakage. Finally, the chapter describes the FFT algorithm and shows how to apply FFT it to estimate a speech spectrum.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A pipeline FFT architecture is proposed, which supports FFT lengths of power-of-two multiple of three and is memory optimal as for N-point transform only N - 1 memory locations are needed.
Abstract: Modern wireless communication systems use orthogonal frequency division multiplexing (OFDM) and multiple input multiple output (MIMO) schemes, which call for fast Fourier transforms (FFT) Traditionally power-of-two FFT lengths have been exploited but recently also non-power-of-two transform lengths have been defined For example, 3GPP LTE specification defines 1536- point FFT In this paper, we propose a pipeline FFT architecture, which supports FFT lengths of power-of-two multiple of three The architecture is basically single delay feedback structure followed by radix-3 computation unit The proposed architecture is memory optimal as for N-point transform only N - 1 memory locations are needed

Proceedings ArticleDOI
01 Aug 2013
TL;DR: Theoretical analysis and experimental results demonstrate that the algorithm is favorable, and the security of the proposed algorithm depends on the transformation algorithm, sensitivity to the randomness of phase mask and the orders of FRFT.
Abstract: In order to transmit image data in open network, a novel image encryption algorithm based on fractional Fourier transform and block-based transformation is proposed in this paper. The image encryption process includes two steps: the original image was divided into blocks, which were rearranged into a transformed image using a transformation algorithm, and then the transformed image was encrypted using the fractional Fourier transform (FRFT) algorithm. The security of the proposed algorithm depends on the transformation algorithm, sensitivity to the randomness of phase mask and the orders of FRFT. Theoretical analysis and experimental results demonstrate that the algorithm is favorable.

Proceedings ArticleDOI
25 Jun 2013
TL;DR: The algorithm proposed in this article transforms the initial correlation into two smaller correlations, and it is shown that the theoretical number of operations can be reduced by about 21 %, and that the memory resources for an FPGA implementation can be almost halved.
Abstract: One of the method to have a fast acquisition of GNSS signals is the parallel code-phase search, which uses the fast Fourier transform (FFT) to perform the correlation. A problem with this method is the potential sign transition that can happen between two code periods due to data or secondary code and lead to a loss of sensitivity or to the non-detection of the signal. A known straightforward solution consists in using two code periods instead of one for the correlation. However, in addition to increasing the complexity, this solution is not efficient since half of the points calculated are discarded. This led us to look for a more efficient algorithm. The algorithm proposed in this article transforms the initial correlation into two smaller correlations. When the radix-2 FFT is used, the proposed algorithm is more efficient for half of the possible sampling frequencies. It is shown for example that the theoretical number of operations can be reduced by about 21 %, and that the memory resources for an FPGA implementation can be almost halved.

Patent
Sheng Xu1, Feng Chen1
18 Apr 2013
TL;DR: In this paper, a vectorization scheme for high dimensional FFTs is presented, which has the best performance on the slowest or higher dimensions of data compared to conventional numerical scheme implementations.
Abstract: Numerical simulations of elastic wave propagation algorithms are critical components for seismic imaging and inversion. Finite-difference schemes yield good efficiency but cannot ensure the accuracy of the high frequency component. Pseudo-spectral algorithms are accurate up to the Nyquist frequency, but its efficiency depends on the optimization of the fast Fourier transform (FFT) algorithm. The conventional FFT algorithms are optimized for signal processing, in which problems are generally one dimensional time series. For 3D wave propagation, FFT algorithms have the potential to be further optimized. Under current computer hardware architecture, a vectorization scheme for high dimensional FFTs is presented. Compared to conventional numerical scheme implementations, the systems and methods disclose herein has the best performance on the slowest or higher dimensions of data. For elastic wave propagation, vectorization improves the efficiency by more than a factor of two when compared to standard FFT algorithms.

Proceedings ArticleDOI
01 Oct 2013
TL;DR: A split radix fast Fourier transform (FFT) algorithm consisting of mixed radix butterflies, whose structure is more regular than the conventional split Radix algorithm, and is fewer operations than the radix-4 algorithms.
Abstract: We present a split radix fast Fourier transform (FFT) algorithm consisting of radix-4 butterflies. The major advantages of the proposed algorithm include: i). The proposed algorithm consists of mixed radix butterflies, whose structure is more regular than the conventional split radix algorithm. ii). The proposed algorithm is asymptomatically equal computation amount to the split radix algorithm, and is fewer operations than the radix-4 algorithms. iii). The proposed algorithm is in the conjugate-pair version, which requires less memory access than the conventional FFT algorithms.

Proceedings ArticleDOI
15 Apr 2013
TL;DR: Novel algorithms for 2-D FFT and IFFT so that they may be realized in hardware to suit VLSI realization, where the processing speed is of paramount importance.
Abstract: High performance Fast Fourier Transform and Inverse Fast Fourier Transform are indispensable algorithms in the field of Digital Signal Processing. They are widely used in different areas of applications such as bio signal data compression, radars, image processing, voice processing etc. FFT algorithm is computationally intensive and need to be processed in real time for most applications. This paper presents novel algorithms for 2-D FFT and IFFT so that they may be realized in hardware. The algorithms have been developed to suit VLSI realization, where the processing speed is of paramount importance. The FFT and IFFT algorithms have been coded in MATLAB and successfully tested for 2D color images. The reconstructed images are indistinguishable from the original as can be seen from the results presented. The reconstructed quality of the images is better than 35 dB.

Proceedings ArticleDOI
Lakshmi Santhosh1, Anoop Thomas1
04 Jul 2013
TL;DR: The Fast Fourier Transform (FFT) and its inverse (IFFT) are very important algorithms in digital signal processing and communication systems and these algorithms have been developed using Verilog hardware description language and implemented on Spartan6 FPGA.
Abstract: The Fast Fourier Transform (FFT) and its inverse (IFFT) are very important algorithms in digital signal processing and communication systems. Radix-2 FFT algorithm is the simplest and most common form of the Cooley-Tukey algorithm. Radix-22 FFT algorithm is an attractive algorithm having same multiplicative complexity as radix-4 algorithm, but retains the simple butterfly structure of radix-2 algorithm. These algorithms have been developed using Verilog hardware description language and implemented on Spartan6 FPGA.

Proceedings ArticleDOI
26 Jun 2013
TL;DR: In this article, the authors considered the problem of efficient computations with structured polynomials and provided complexity results for computing Fourier Transform and truncated Fourier transform of symmetric polynomial.
Abstract: In this paper, we consider the problem of efficient computations with structured polynomials. We provide complexity results for computing Fourier Transform and Truncated Fourier Transform of symmetric polynomials, and for multiplying polynomials supported on a lattice.

Journal ArticleDOI
TL;DR: The focus is on studying the analog of the Cooley-Tukey algorithm because the number of operations applied to calculate the n-dimensional FFT is considerably less than in the conventional algorithm.
Abstract: The one-dimensional fast Fourier transform (FFT) is the most popular tool for calculating the multidimensional Fourier transform. As a rule, to estimate the n-dimensional FFT, a standard method of combining one-dimensional FFTs, the so-called "by rows and columns" algorithm, is used in the literature. For fast calculations, different researchers try to use parallel calculation tools, the most successful of which are searches for the algorithms related to the computing device architecture: cluster, video card, GPU, etc. [1, 2]. The possibility of paralleling another algorithm for FFT calculation, which is an n-dimensional analog of the Cooley-Tukey algorithm [3, 4], is studied in this paper. The focus is on studying the analog of the Cooley-Tukey algorithm because the number of operations applied to calculate the n-dimensional FFT is considerably less than in the conventional algorithm nN n log2 N of addition operations and 1/2N n + 1log2 N of multiplication operations of addition operations and $$\frac{{2^n - 1}} {{2^n }}N^n \log _2 N$$ of multiplication operations against: N n + 1log2 N of addition operations and 1/2N n + 1log2 N of in combining one-dimensional FFTs.

Proceedings ArticleDOI
TL;DR: The decomposition of the FFT algorithm into the basic Butterfly operations is described, as this allows the algorithm to be fully implemented by the successive coherent addition and subtraction of two wavefronts, facilitating a simple and robust hardware implementation based on waveguided hybrid devices as employed in coherent optical detection modules.
Abstract: Optical structures to implement the discrete Fourier transform (DFT) and fast Fourier transform (FFT) algorithms for discretely sampled data sets are considered. In particular, the decomposition of the FFT algorithm into the basic Butterfly operations is described, as this allows the algorithm to be fully implemented by the successive coherent addition and subtraction of two wavefronts (the subtraction being performed after one has been appropriately phase shifted), so facilitating a simple and robust hardware implementation based on waveguided hybrid devices as employed in coherent optical detection modules. Further, a comparison is made to the optical structures proposed for the optical implementation of the quantum Fourier transform and they are shown to be very similar.

Proceedings ArticleDOI
17 Nov 2013
TL;DR: The authors' parallel sFFT (PsFFT) implementation achieves approximately 60% parallel efficiency on a single 8-core Intel Sandy Bridge socket for relevant test cases and applies several techniques such as index coalescing, data affiliated loops and multi-level blocking techniques to alleviate memory access congestion and increase performance.
Abstract: The Fast Fourier Transform (FFT) is a widely used numerical algorithm. When N input data points lead to only k

Journal ArticleDOI
TL;DR: The paired-transform based algorithm of the FFT is faster than the radix-2 FFT, consequently it is useful for higher sampling rates and also on the Virtex-II pro FPGAs.
Abstract: Frequency analysis plays vital role in the applications like cryptanalysis, steganalysis, system identification, controller tuning, speech recognition, noise filters, etc. Discrete Fourier Transform (DFT) is a principal mathematical method for the frequency analysis. The way of splitting the DFT gives out various fast algorithms. In this paper, we present the implementation of two fast algorithms for the DFT for evaluating their performance. One of them is the popular radix-2 Cooley-Tukey fast Fourier transform algorithm (FFT) (1) and the other one is the Grigoryan FFT based on the splitting by the paired transform (2). We evaluate the performance of these algorithms by implementing them on the TMS320C62x DSP and also on the Virtex-II pro FPGAs. Finally we show that the paired-transform based algorithm of the FFT is faster than the radix-2 FFT, consequently it is useful for higher sampling rates.

Proceedings ArticleDOI
26 May 2013
TL;DR: A parallel implementation method of FFT-based full-search BMAs that can not only process in parallel, but also select the efficient FFT size and calculate two cross-correlations at the same time is proposed.
Abstract: One category of fast full-search block matching algorithms (BMAs) is based on the fast Fourier transformation (FFT). This paper proposes a parallel implementation method of FFT-based full-search BMAs. The FFT-based full-search BMAs are much faster than the direct full-search BMA, and its accuracy is as same as the direct full-search BMA. However, these are not designed for parallel processing. The proposed method divides the search window into multiple sub search windows using the overlap-save method, and the FFT-based full-search BMA is applied to each sub search window. These sub search windows are processed in parallel. By dividing the search window, the method can not only process in parallel, but also select the efficient FFT size. Furthermore, the method can also calculate two cross-correlations at the same time. These properties also contribute to speeding up of the block matching. The experimental results shows that the method on 6 cores CPU is about 11 times faster than the conventional FFT-based full-search BMA.

Proceedings ArticleDOI
23 May 2013
TL;DR: The appropriate ordering of coefficients, based on the guidance given by the improved Anedma algorithm, can contribute for the reduction of Hamming distance of the encoded twiddle factors.
Abstract: This paper addresses the exploration of different heuristic algorithms for a better manipulation of twiddle factors of Fast Fourier Transform (FFT). The FFT algorithm involve multiplications of input data with appropriate coefficients, hence the best ordering of those operations can contribute for reducing the switching activity, what leads to the minimization of power consumption in FFTs. The heuristic algorithm named Bellmore and Nemhauser, and a proposed one named Anedma in both original and improved versions, are used to get as near as possible to the optimal solution for the ordering and partitioning of coefficients in FFTs. Data encoding methods are used for decreasing switching activity for transmitting information over buses, hence we have used some encoding techniques in the coefficients. As will be shown, the appropriate ordering of coefficients, based on the guidance given by the improved Anedma algorithm, can contribute for the reduction of Hamming distance of the encoded twiddle factors.

Journal ArticleDOI
TL;DR: A parallel conflict-free access scheme for a constant geometry architecture which is unlike the previous schemes is proposed, which only uses one modular addition operation, and does not involve complicated operations, thus reducing the hardware complexity of address generation.
Abstract: In this paper, a parallel conflict-free access scheme for a constant geometry architecture which is unlike the previous schemes is proposed. The proposed method only uses one modular addition operation, and does not involve complicated operations, thus reducing the hardware complexity of address generation. Because of the reduction of the combinational logic which is used to generate the access address, the scheme also reduces the time delay and accordingly improves the executable frequency of fast Fourier transform (FFT) processors. In the scheme, we use an arbitrary radix, i.e., radix-r, to implement the scheme. The scheme is not only applicable to radix-r FFT processors with one butterfly unit, but is also suitable for FFT processors with multiple butterfly units. Because the same architecture is used for every stage of the constant geometry, it can enhance the flexibility of the FFT implementation. Finally, we analyze the resource costs and time delay of the proposed method, and the results verify the advantages of the proposed scheme.