scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 2005"


Journal ArticleDOI
TL;DR: The Fast Linear Canonical Transform (FLCT) as mentioned in this paper is derived from the linear canonical transform (LCT) and can be used for FFT, FRT, and FST calculations.
Abstract: The linear canonical transform (LCT) describes the effect of any quadratic phase system (QPS) on an input optical wave field. Special cases of the LCT include the fractional Fourier transform (FRT), the Fourier transform (FT), and the Fresnel transform (FST) describing free-space propagation. Currently there are numerous efficient algorithms used (for purposes of numerical simulation in the area of optical signal processing) to calculate the discrete FT, FRT, and FST. All of these algorithms are based on the use of the fast Fourier transform (FFT). In this paper we develop theory for the discrete linear canonical transform (DLCT), which is to the LCT what the discrete Fourier transform (DFT) is to the FT. We then derive the fast linear canonical transform (FLCT), an NlogN algorithm for its numerical implementation by an approach similar to that used in deriving the FFT from the DFT. Our algorithm is significantly different from the FFT, is based purely on the properties of the LCT, and can be used for FFT, FRT, and FST calculations and, in the most general case, for the rapid calculation of the effect of any QPS.

167 citations


Journal ArticleDOI
TL;DR: A new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy that can reduce hardware complexity and computation cycles compared with existing FFT processors is proposed.
Abstract: The paper proposes a new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy. The existing in-place strategy supports only a fixed-radix FFT algorithm. In contrast, the proposed in-place strategy can support the MR algorithm, which allows CF FFT computations regardless of the length of FFT. The novel in-place strategy is made by interchanging storage locations of butterfly outputs. The CFMR FFT processor provides the MR algorithm, the in-place strategy, and the CF FFT computations at the same time. The CFMR FFT processor requires only two N-word memories due to the proposed in-place strategy. In addition, it uses one butterfly unit that can perform either one radix-4 butterfly or two radix-2 butterflies. The CFMR FFT processor using the 0.18 /spl mu/m SEC cell library consists of 37,000 gates excluding memories, requires only 640 clock cycles for a 512-point FFT and runs at 100 MHz. Therefore, the CFMR FFT processor can reduce hardware complexity and computation cycles compared with existing FFT processors.

128 citations


Journal ArticleDOI
Maria Eleftheriou1, Blake G. Fitch1, Aleksandr Rayshubskiy1, T. J. C. Ward1, R. S. Germain1 
TL;DR: The volumetric FFT outperforms a port of the FFTW Version 2.1.5 library on large-node-count partitions and compared with that of the Fastest Fourier Transform in the West (FFTW) library.
Abstract: This paper presents results on a communications-intensive kernel, the three-dimensional fast Fourier transform (3D FFT), running on the 2,048-node Blue Gene®/L (BG/L) prototype. Two implementations of the volumetric FFT algorithm were characterized, one built on the Message Passing Interface library and another built on an active packet Application Program Interface supported by the hardware bring-up environment, the BG/L advanced diagnostics environment. Preliminary performance experiments on the BG/L prototype indicate that both of our implementations scale well up to 1,024 nodes for 3D FFTs of size 128 × 128 × 128. The performance of the volumetric FFT is also compared with that of the Fastest Fourier Transform in the West (FFTW) library. In general, the volumetric FFT outperforms a port of the FFTW Version 2.1.5 library on large-node-count partitions.

61 citations


Journal ArticleDOI
TL;DR: A traced FFT Pruning method (TFFTP) is developed, which is a novel technique and does not require that the outputs be in continuous windows of the fast Fourier transform.
Abstract: The fast Fourier transform (FFT) is an essential tool in digital signal processing and communications. In the applications of the FFT where the required outputs are very sparse, for example, in digital filtering, one may only require the spectrum corresponding to certain bins of the FFT or in narrow frequency windows. In these cases, most of the FFT outputs are not required. Some pruning algorithms have been proposed to deal with such cases. However, most of the pruning algorithms require that the outputs be in continuous windows. This paper develops a traced FFT Pruning method (TFFTP), which is a novel technique and does not require this condition. Under some circumstances, considerable savings in computational complexity and power consumption can be realized using the TFFTP compared to the FFT. This paper derives the average number of butterflies that need to be executed when only k/sub in/ input or/and k/sub out/ output bins of an N point FFT, where k/sub in//spl les/N, or/and k/sub out//spl les/N are required. This method is then extended to arbitrary radix FFT pruning and simultaneous input and output pruning case.

42 citations


Proceedings ArticleDOI
05 Dec 2005
TL;DR: A new general method to deduce FFT algorithms is introduced, and the deduced second radix-2 decimation-in-time FFT algorithm is transformed into another parallelizable sequential form, reducing the time complexity of DFT to O(nlogn/p) (where p is the number of processors).
Abstract: Discrete Fourier transform (DFT) has many applications in digital signal and image processing and other scientific and technological domains, but its time complexity of direct computation is O(n2), limiting greatly its application range. Thus many people have developed fast Fourier transform (FFT) algorithms, reducing the complexity from O(n2) to O(nlogn)(In this paper logn denotes log2n).But for large n, O(nlogn) is still very high. So multiprocessor systems have been used to speed up the computation of DFT. This paper first introduces a new general method to deduce FFT algorithms, then transforms the deduced second radix-2 decimation-in-time FFT algorithm into another parallelizable sequential form, and finally transforms the latter algorithm into a new parallel FFT algorithm, reducing the time complexity of DFT to O(nlogn/p) (where p is the number of processors). Using similar methods, the authors can also design other new parallel 1-D and 2-D FFT algorithms.

21 citations


Proceedings ArticleDOI
19 Jun 2005
TL;DR: Results show that the design and hardware implementation of a FFT-based algorithm using modular arithmetic to efficiently compute very large number multiplications starts to be useful for 4096-bit operands and beyond.
Abstract: Modular multiplication (MM) for large integers is the foundation of most public-key cryptosystems, specifically RSA, El-Gamal and the elliptic curve cryptosystems. Thus MM algorithms have been studied widely and extensively. Most of works are based on the well known Montgomery multiplication method (MMM) and its variants, which require multiplication in N. Authors have always avoided the fast Fourier transform (FFT) method believing that it is impractical for present system sizes despite its smaller complexity order. In this paper, the authors presented the design and hardware implementation of a FFT-based algorithm using modular arithmetic to efficiently compute very large number multiplications. The algorithm has been implemented in CASM, an intermediate level HDL developed in the laboratory. The target architecture is a FPGA. The algorithm is scalable and can easily be mapped to any operand size. Results show that such algorithm implementation starts to be useful for 4096-bit operands and beyond.

20 citations


Journal ArticleDOI
TL;DR: An efficient method for the realization of the paired algorithm for calculation of the one-dimensional (1-D) discrete Fourier transform (DFT), by simplifying the signal-flow graph of the transform, is described.
Abstract: An efficient method for the realization of the paired algorithm for calculation of the one-dimensional (1-D) discrete Fourier transform (DFT), by simplifying the signal-flow graph of the transform, is described. The signal-flow graph is modified by separating the calculation for real and imaginary parts of all inputs and outputs in the signal-flow graph and using properties of the transform. The examples for calculation of the eight- and 16-point DFTs are considered in detail. The calculation of the 16-point DFT of real data requires 12 real multiplications and 58 additions. Two multiplications and 20 additions are used for the eight-point DFT.

17 citations


Journal ArticleDOI
01 Sep 2005
TL;DR: This segment deals with some aspects of the spectrum estimation problem of the fast Fourier transform, specifically the problem of estimating the intensity of the visible spectrum.
Abstract: Each article in this continuing series on the fast Fourier transform (FFT) is designed to illuminate new features of the wide-ranging applicability of this transform. This segment deals with some aspects of the spectrum estimation problem.

17 citations


Patent
08 Aug 2005
TL;DR: In this article, a system and method Fast Fourier Transform (FFT) method in a multi-mode wireless processing system is presented. And the method can include loading an input vector into an input buffer, initializing a second counter and a variable N, where N = log 2 (input vector size), and s is the value of the second counter, performing an FFT stage, and comparing s to N and performing additional FFT stages until s=N.
Abstract: A system and method Fast Fourier Transform (FFT) method in a multi-mode wireless processing system. The method can include loading an input vector into an input buffer, initializing a second counter and a variable N, where N=log2 (input vector size), and s is the value of the second counter, performing an FFT stage, and comparing s to N and performing additional FFT stages until s=N. Performing the FFT stage can include performing vector operations on data in the input buffer and sending results to an output buffer, the data in the input buffer comprising a plurality of segments, advancing the value of the second counter; and switching roles of the input and output buffers. The vector operations can include performing Radix-4 FFT vector operations on the four input data at a time and multiplying the resulting output vectors with a Twiddle factor.

17 citations


Proceedings ArticleDOI
18 Mar 2005
TL;DR: It can be shown that all the possible split-radix FFT algorithms of the type radix-2/sup r//2/Sup rs/ for computing a 2/sup m/-point DFT require exactly the same number of arithmetic operations.
Abstract: A radix-2/16 decimation-in-frequency (DIF) fast Fourier transform (FFT) algorithm and its higher radix version, namely radix-4/16 DIF FFT algorithm, are proposed by suitably mixing the radix-2, radix-4 and radix-16 index maps, and combing some of the twiddle factors. It is shown that the proposed algorithms and the existing radix-2/4 and radix-2/8 FFT algorithms require exactly the same number of arithmetic operations (multiplications+additions). Moreover, by using techniques similar to these, it can be shown that all the possible split-radix FFT algorithms of the type radix-2/sup r//2/sup rs/ for computing a 2/sup m/-point DFT require exactly the same number of arithmetic operations.

16 citations


Proceedings ArticleDOI
01 May 2005
TL;DR: This paper proposes a novel FFT based finite field multiplier based on the fast Fourier transform that performs polynomial multiplication in O(nlog(n) time compared to the classical method time of O( n2).
Abstract: Finite field multiplication is one of the most useful arithmetic operations and has applications in many areas such as signal processing, coding theory and cryptography. However, it is also one of the most time consuming operations in both software and hardware, which makes it pertinent to develop a fast and efficient implementation. In this paper, we propose a novel FFT based finite field multiplier to address this problem. The fast Fourier transform (FFT) is the collection of computationally efficient algorithms that perform the discrete Fourier transform (DFT). For our purposes, we will use its efficient computation for polynomial multiplication. The FFT performs polynomial multiplication in O(nlog(n)) time compared to the classical method time of O(n2). The idea of using the FFT for finite field multiplication has been researched extensively, but to our knowledge, this is the first implementation in hardware

Proceedings ArticleDOI
19 Sep 2005
TL;DR: A new fast algorithm using multilevel Taylor interpolation and the FFT (TI-FFT) has been developed to solve the near-field (NF) propagation problem for the planar scenario.
Abstract: A new fast algorithm using multilevel Taylor interpolation and the FFT (TI-FFT) has been developed to solve the near-field (NF) propagation problem for the planar scenario. The algorithm speeds the computation by grouping neighborhood regions in the spatial domain or the spectral domain through the Taylor interpolation (TI) method using the FFT technique. The CPU time increases as O(N/sup 2/ log/sub 2/ N/sup 2/) instead of the polynomial time O(N/sup 4/) required for the Stratton-Chu formula for N /spl times/ N observation points. The multilevel TI-FFT uses a sampling rate above the Nyquist rate as required by the FFT, while the Stratton-Chu formula requires a higher sampling rate because of the fast variation of the phase term. An accuracy of -50 dB for the multilevel TI-FFT algorithm is easily obtained and an accuracy of -70 dB is possible when the algorithm is optimized. The algorithm works particularly well for band-limited beam-like fields and "quasi-planar" surfaces.

Proceedings ArticleDOI
18 Mar 2005
TL;DR: A fast iterative algorithm, with computation based on the fast Fourier transform (FFT), is presented, which achieves better performance than traditional FFT-based deconvolution methods with an equal number of coefficients in the inverse filters.
Abstract: A fast iterative algorithm, with computation based on the fast Fourier transform (FFT), is presented. It can be used to control a sound field at several control points with a loudspeaker array from multiple reference signals. It designs an equalizer able to invert long FIR filters and which achieves better performance than traditional FFT-based deconvolution methods with an equal number of coefficients in the inverse filters.

Proceedings ArticleDOI
19 Dec 2005
TL;DR: This paper proposes an alternate instance of padding zeros to the data sequence that results in computational cost reduction to O(pNlog2 N) and can be used to achieve non-uniform upsampling that would zoom-in or zoom-out a particular frequency band.
Abstract: The classical Cooley-Tukey fast Fourier transform (FFT) algorithm has the computational cost of O(Nlog2N) where N is the length of the discrete signal. Spectrum resolution is improved through padding zeros at the tail of the discrete signal, if (p -1)N zeros are padded (where p is an integer) at the tail of the data sequence, the computational cost through FFT becomes O(pNlog2pN). This paper proposes an alternate instance of padding zeros to the data sequence that results in computational cost reduction to O(pNlog2 N). It has been noted that this modification can be used to achieve non-uniform upsampling that would zoom-in or zoom-out a particular frequency band, in addition, it may be used for pruning the spectrum, which would reduce resolution of an unimportant frequency band

Journal ArticleDOI
TL;DR: It is demonstrated that the two-dimensional fast Fourier transform?(FFT) is a useful algorithm due to its hierarchical structure and ability to determine the relative magnitudes of different spatial wavelengths in a material.
Abstract: This work is part of an effort to structurally integrate self-sensing functionality into smart composite materials using embedded microsensors and local network communication nodes. Here we address the issue of data management through the development of localized processing algorithms. We demonstrate that the two-dimensional fast Fourier transform?(FFT) is a useful algorithm due to its hierarchical structure and ability to determine the relative magnitudes of different spatial wavelengths in a material. This may be applied, for example, to determine the global components of a strain field or temperature distribution. We develop two methods for implementing the distributed 2D FFT based on the radix-2 (row?column) and radix-2 ? 2 (vector?radix) structures, and compare them in terms of computational requirements within a low power, low bandwidth network of microprocessors. Our results show that the vector?radix algorithm requires 50% fewer multiplications than the row?column algorithm when performed in a distributed manner. Since the most important information of the 2D FFT can often be found in the lowest frequency components, we develop pruning methods for the distributed row?column and vector?radix algorithms that reduce internode communication requirements by 50% in both cases. We conclude that the pruned version of the distributed vector?radix 2D FFT is the most efficient of the methods investigated for rapid signal identification in smart composite materials.

Proceedings ArticleDOI
20 Mar 2005
TL;DR: The adaptive matrix-transpose algorithm is efficient since it minimizes the overhead associated with transposing matrices by adaptively choosing the suitable radix based on data size, number of processors, start-up time, and the effective bandwidth.
Abstract: Computing fast Fourier transform (FFT) on parallel computers has the same communication requirement to transpose matrices one or more times. In this paper, we propose an efficient algorithm (the adaptive matrix-transpose algorithm) for transposing matrices, which is based on all-to-all communication. The adaptive matrix-transpose algorithm is efficient since it minimizes the overhead associated with transposing matrices by adaptively choosing the suitable radix based on data size, number of processors, start-up time, and the effective bandwidth. We study the effect of the adaptive matrix-transpose algorithm on the 6-step 1-D FFT using symmetric multiprocessors (SMP).

Journal ArticleDOI
TL;DR: A novel design-for-testability approach based on M- testability conditions for module-level systolic fast Fourier transform (FFT) arrays, which guarantees 100% single-module-fault testability with a minimum number of test patterns is proposed.
Abstract: In this paper, we first propose a novel design-for-testability approach based on M-testability conditions for module-level systolic fast Fourier transform (FFT) arrays. Our M-testability conditions guarantee 100% single-module-fault testability with a minimum number of test patterns. Based on this testable design, fault-tolerant approaches at the bit level and the multiply-subtract-add (MSA) module level are proposed, respectively. If the reconfiguration is performed at the bit level, then the FFT/sub BIT/ network is constructed. Two types of reconfiguration schemes (Type-I FFT/sub MSA/ and Type-II FFT/sub MSA/) are proposed at the MSA module level. Since both the design for testability (DFT) and the design for yield (DFY) issues are considered at the same time for all these proposed approaches, the resulting architectures are simpler as compared with previous works. The reliability of the FFT system increases significantly. The hardware overhead is low-about 12% and 1/2N for the FFT/sub BIT/ network and the Type-II FFT/sub MSA/ network, respectively. An experimental chip is also implemented to verify our approaches. Reliabilities and hardware overhead are also evaluated and compared with previous works.

Journal ArticleDOI
TL;DR: Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18 μm CMOS technology are given to demonstrate its scalability and high performance.
Abstract: Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n = 2s data (k = 0,1,..., s - 1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s×2s-k) × tclk and the throughput is n/(s × 2s-k × tclk), where tclk is the system clock period. Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18 μm CMOS technology are given to demonstrate its scalability and high performance. The core area of 4-BU design is 2.991 × 1.121 mm2 and clock frequency is 326 MHz in typical condition (1.8 V, 25°C). This processor completes 1024 FFT calculation in 7.839 μs.

01 Jan 2005
TL;DR: The history of musical notation and its applications in the 21st Century are described in more detail in the book “Architecture and Music: A Biography of a Language” by Gordon Brewer.
Abstract: THE HYBRID ARCHITECTURE PARALLEL FAST FOURIER

Patent
09 Feb 2005
TL;DR: In this article, a matrix prefetch buffer-based fast Fourier transform processor is proposed to reduce quantization errors generated from the operation by using a matrix pre-fetch buffer.
Abstract: The present invention provides a fast Fourier transform processor, dynamic scaling method and fast Fourier transform with radix-8 algorithm. It reduces quantization errors generated from the operation by using a matrix prefetch buffer-based fast Fourier transform processor. Operation sizes of the matrix prefetch buffer as block sizes the invention adjust the signals against overflow by the status of signals in each block. It can shunt time of complex multiplication operation systematically and reduce operation complexity in butterfly units by utilizing algorithms of 3-step radix-8 fast Fourier transform and re-scheduling. Moreover, the present invention provides a fast Fourier transform processor for realizing the methods and algorithms mentioned above.

Proceedings ArticleDOI
28 Sep 2005
TL;DR: Methods analysed in this paper differs by the way how overflow of mathematical operation results (complex addition and complex multiplication) is prevented.
Abstract: This paper presents some methods for maintaining accuracy in implementation of fast Fourier transform on fixed point DSPs and analysis of their performance. Methods analysed in this paper differs by the way how overflow of mathematical operation results (complex addition and complex multiplication) is prevented. Depending of capabilities of specific fixed point DSP suitable method can be chosen.

Patent
23 Sep 2005
TL;DR: In this article, an FFT apparatus for quickly processing input signals and method thereof is disclosed, where the input signals are processed in parallel through the N/4-point FFT units, and thus a quick process of the input signal can be performed.
Abstract: An FFT apparatus for quickly processing input signals and method thereof is disclosed. In performing the FFT for processing N input signals, four N/4-point FFT units implemented by radix-2 single-path delay feedback (R2SDF) units performs the FFT with respect to the input signals, and a radix-4 computation unit performs a radix-4 computation with respect to the signals transferred from the N/4-point FFT units. Accordingly, the input signals are processed in parallel through the N/4-point FFT units, and thus a quick process of the input signals can be performed.

Patent
Kim Rounioja1, Sien Ong2
05 Apr 2005
TL;DR: In this article, the authors describe a method of computing a fast Fourier transform (FFT) using enhanced processor computational capabilities for more efficient and flexible implementation of an electronic device (e.g., a linear equalizer) based on that FFT computing.
Abstract: This invention describes a method of computing a fast Fourier transform (FFT) using enhanced processor computational capabilities for more efficient and flexible implementation of an electronic device (e.g., a linear equalizer) based on that FFT computing. A simple non-parallel instruction set processor (or just a non-parallel processor) containing complex multiplication and addition/subtraction capabilities is extended by adding additional registers and interconnects and a dedicated parallel instruction for calculating the FFT butterfly. The parallel instruction consists of orthogonal sub-instructions each controlling a section of the data path related to a corresponding section of the FFT butterfly.

Proceedings ArticleDOI
20 Mar 2005
TL;DR: The no-communication algorithm is presented that is a parallel algorithm for 1-D FFT without inter-processors communication and shows that the no- communication algorithm performs better than the 4-step FFT for relatively small data sizes.
Abstract: Computing 1-D fast Fourier transform (FFT) using the classical 4-step FFT on parallel computers requires intensive all-to-all communication. This all-to-all communication significantly reduces the performance of FFT. In this paper, we present the no-communication algorithm that is a parallel algorithm for 1-D FFT without inter-processors communication. The advantage of this algorithm is the absence of all-to-all communication between processors. The disadvantage of this algorithm is the extra computation compared to the classical 4-step FFT. The no-communication algorithm has been implemented and tested in 8-node symmetric multiprocessors (SMP). The results show that the no-communication algorithm performs better than the 4-step FFT for relatively small data sizes. However, 4-step FFT algorithm performs better than the no-communication for relatively large data sizes.

Journal ArticleDOI
TL;DR: Results of comparison of magnitude spectrum estimation of periodical signals by Discrete Fourier Transform or Fast Fourier transform and special Finite Impulse Response filters with other methods of spectrum analysis, based on DFT/FFT and sample interpolation in time domain (resampling).

Proceedings ArticleDOI
01 Jan 2005
TL;DR: The paper investigates the potential of the very fast Fourier transform (VFFT) for implementation of orthogonal frequency division multiplexing (OFDM) with HiperLAN/2 parameters, and presents a performance comparison with FFT-based OFDM over AWGN channels.
Abstract: The paper investigates the potential of the very fast Fourier transform (VFFT) for implementation of orthogonal frequency division multiplexing (OFDM) with HiperLAN/2 parameters, and presents a performance comparison with FFT-based OFDM over AWGN channels. The VFFT is a fast Fourier transform (FFT) algorithm, which can be implemented in a variety of ways, replacing the FFT either exactly (with floating-point accuracy) or at various levels of approximation. Approximate forms with lower complexity reduce computational load and system complexity, and lead to the VFFT-based OFDM (G-OFDM and % n G-OFDM). n G-OFDM with different values of n-quantisation level trades system complexity against system performance. (6 pages)

Proceedings ArticleDOI
06 Dec 2005
TL;DR: In this article, the phase derivative FFT (PDFFT) was proposed to estimate the frequency of a sinusoid from the short time Fourier transform (STFT).
Abstract: This paper presents the phase derivative FFT (PDFFT)-a computationally efficient algorithm for estimating the frequency of a sinusoid from the short time Fourier transform (STFT). Upon obtaining initial coarse estimates from the FFT of a given frame, the PDFFT makes further refinement to the frequency estimate using only the time derivative of the phase response. The algorithm is derived and is shown to require only 4 multiplies per peak. Single frequencies in the presence of noise are resolved well, outperforming the commonly used quadratically interpolated FFT (QIFFT) method even with zero-padding. The algorithm is then used to separate two sinusoids of close frequency proximity that appear as a single peak in the magnitude spectrum

Proceedings ArticleDOI
17 Apr 2005
TL;DR: This paper proposes an FFT array processing mapping algorithm that provides flexible tradeoff between hardware cost and system performance, and with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly.
Abstract: Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. An FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=22 data (k=0, 1,..., s-1). Because no inter stage data transfer is required, memory consumption is reduced to 1/3 of the original algorithm. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. An 18-bit word-length 1024-point FFT architecture with 4 BUs is given to demonstrate this mapping algorithm. The design is implemented with TSMC 0.18μm CMOS technology. The core area is 2.99x1.12mm 2 and clock frequency is 326MHz in typical condition (1.8V, 25°C). This processor could complete 1024 FFT calculation in 7.839μs.

Book ChapterDOI
Xiaoxin Guo1, Zhiwen Xu1, Yinan Lu1, Zhanhui Liu1, Yunjie Pang1 
23 Aug 2005
TL;DR: This paper proposes a novel registration algorithm based on Pseudo-Polar Fast Fourier Transform and Analytical Fourier-Mellin Transform for the alignment of images differing in translation, rotation angle, and uniform scale factor that is accurate and robust regardless of white noise.
Abstract: This paper proposes a novel registration algorithm based on Pseudo-Polar Fast Fourier Transform (FFT) and Analytical Fourier-Mellin Transform (AFMT) for the alignment of images differing in translation, rotation angle, and uniform scale factor. The proposed algorithm employs the AFMT of the Fourier magnitude to determine all the geometric transformation parameters with its property of the invariance to translation and rotation. Besides, the proposed algorithm adopt a fast high accuracy conversion from Cartesian to polar coordinates based on the pseudo-polar FFT and the conversion from the pseudo-polar to the polar grid, which involves only 1D interpolations, and obtain a more significant improvement in accuracy than the conventional method using cross-correlation. Experiments show that the algorithm is accurate and robust regardless of white noise.

Proceedings ArticleDOI
18 Mar 2005
TL;DR: A fast recursive algorithm for computation of the running discrete Hartley transform (RDHT) is presented and provides substantial computational savings compared with the recursive RDFT algorithm.
Abstract: A fast recursive algorithm for computation of the running discrete Hartley transform (RDHT) is presented. This method is based on the relation between the running discrete Fourier transform (RDFT) and the RDHT. The number of operations for the proposed recursive algorithm is only 2/N (N=length of the transform) of the direct computation of the RDHT. It also provides substantial computational savings compared with the recursive RDFT algorithm. A transform-domain adaptive digital filter is implemented based on the presented algorithm. Simulation results of its implementation on an adaptive line enhancer are given to demonstrate the efficiency of the presented fast algorithm.