scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 2004"


Journal ArticleDOI
TL;DR: This paper observes that one of the standard interpolation or "gridding" schemes, based on Gaussians, can be accelerated by a significant factor without precomputation and storage of the interpolation weights, of particular value in two- and three- dimensional settings.
Abstract: The nonequispaced Fourier transform arises in a variety of application areas, from medical imaging to radio astronomy to the numerical solution of partial differential equations. In a typical problem, one is given an irregular sampling of N data in the frequency domain and one is interested in reconstructing the corresponding function in the physical domain. When the sampling is uniform, the fast Fourier transform (FFT) allows this calculation to be computed in O(N log N ) operations rather than O(N 2 ) operations. Unfortunately, when the sampling is nonuniform, the FFT does not apply. Over the last few years, a number of algorithms have been developed to overcome this limitation and are often referred to as nonuniform FFTs (NUFFTs). These rely on a mixture of interpolation and the judicious use of the FFT on an oversampled grid (A. Dutt and V. Rokhlin, SIAM J. Sci. Comput., 14 (1993), pp. 1368-1383). In this paper, we observe that one of the standard interpolation or "gridding" schemes, based on Gaussians, can be accelerated by a significant factor without precomputation and storage of the interpolation weights. This is of particular value in two- and three- dimensional settings, saving either 10 d N in storage in d dimensions or a factor of about 5-10 in CPUtime (independent of dimension).

714 citations


Journal ArticleDOI
TL;DR: In this paper, a numerical procedure for the simulation of non-Gaussian surfaces has been developed, which can simulate surfaces with given skewness and kurtosis and with spectral density or auto-correlation function.

123 citations


Proceedings ArticleDOI
23 May 2004
TL;DR: In this paper, an efficient variable-length FFT processor architecture suitable for multi-mode and multi-standard OFDM communication systems is proposed, based on radix-2/sup 2/ DIF FFT algorithm and also supports non-power-of-4 FFT computation.
Abstract: In this paper, we propose an efficient variable-length FFT processor architecture suitable for multi-mode and multi-standard OFDM communication systems. The FFT processor is based on radix-2/sup 2/ DIF FFT algorithm and also supports non-power-of-4 FFT computation. The design contains an efficient processing element (PE), which can execute radix-2/sup 2/ butterfly (BF) operations, as well as radix-2 BF operations. Moreover, in order to achieve high-performance variable-length FFT operations and data accesses, an efficient variable-length address generator and twiddle factor generator are designed. The design has the merits of low complexity and high speed performance. The designs consider seven different FFT lengths including 64, 256, 512, 1024, 2048, 4096, and 8192 points, which cover all the required FFT lengths by 802.11a, 802.16a, DAB, DVB-T, VDSL and ADSL.

60 citations


Journal ArticleDOI
TL;DR: A new radix-2/8 fast Fourier transform (FFT) algorithm is proposed for computing the discrete Fouriertransform of an arbitrary length N=q/spl times/2/sup m/, where q is an odd integer.
Abstract: In this paper, a new radix-2/8 fast Fourier transform (FFT) algorithm is proposed for computing the discrete Fourier transform of an arbitrary length N=q/spl times/2/sup m/, where q is an odd integer. It reduces substantially the operations such as data transfer, address generation, and twiddle factor evaluation or access to the lookup table, which contribute significantly to the execution time of FFT algorithms. It is shown that the arithmetic complexity (multiplications+additions) of the proposed algorithm is, in most cases, the same as that of the existing split-radix FFT algorithm. The basic idea behind the proposed algorithm is the use of a mixture of radix-2 and radix-8 index maps. The algorithm is expressed in a simple matrix form, thereby facilitating an easy implementation of the algorithm, and allowing for an extension to the multidimensional case. For the structural complexity, the important properties of the Cooley-Tukey approach such as the use of the butterfly scheme and in-place computation are preserved by the proposed algorithm.

50 citations


Proceedings ArticleDOI
23 May 2004
TL;DR: These modified radix-4 andRadix-8 algorithms provide savings of more than 33% and 42% respectively in the number of twiddle factor evaluations or accesses to the lookup table compared to the corresponding conventional FFT algorithms without imposing any additional complexity.
Abstract: In this paper, improved algorithms for radix-4 and radix-8 FFT are presented. This is achieved by re-indexing a subset of the output samples resulting from the conventional decompositions in the radix-4 and radix-8 FFT algorithms. These modified radix-4 and radix-8 algorithms provide savings of more than 33% and 42% respectively in the number of twiddle factor evaluations or accesses to the lookup table compared to the corresponding conventional FFT algorithms without imposing any additional complexity.

28 citations


Journal ArticleDOI
TL;DR: A fast algorithm for the evaluation of the Fourier transform of piecewise smooth functions with uniformly or nonuniformly sampled data by using a double interpolation procedure combined with the fast Fouriertransform (FFT) algorithm is presented.
Abstract: In computational electromagnetics and other areas of computational science and engineering, Fourier transforms of discontinuous functions are often required. We present a fast algorithm for the evaluation of the Fourier transform of piecewise smooth functions with uniformly or nonuniformly sampled data by using a double interpolation procedure combined with the fast Fourier transform (FFT) algorithm. We call this the discontinuous FFT algorithm. For N sample points, the complexity of the algorithm is O(/spl nu/Np+/spl nu/Nlog(N)) where p is the interpolation order and /spl nu/ is the oversampling factor. The method also provides a new nonuniform FFT algorithm for continuous functions. Numerical experiments demonstrate the high efficiency and accuracy of this discontinuous FFT algorithm.

26 citations


Patent
03 Dec 2004
TL;DR: In this paper, a Fast Fourier Transform (FFT) hardware implementation and method provides efficient FFT processing while minimizing the die area needed in an Integrated Circuit (IC), where FFT hardware can implement an N point FFT, where N = rn is a function of a radix (r), and the hardware implementation includes a sample memory having N/r rows, each storing r samples.
Abstract: A Fast Fourier Transform (FFT) hardware implementation and method provides efficient FFT processing while minimizing the die area needed in an Integrated Circuit (IC). The FFT hardware can implement an N point FFT, where N = rn is a function of a radix (r). The hardware implementation includes a sample memory having N/r rows, each storing r samples. A twiddle factor memory can store k twiddle factors per row, where 0 < k

25 citations


Patent
07 Jan 2004
TL;DR: In this article, a digital signal processor structure by performing length-scalable Fast Fourier Transformation (FFT) discloses a single processor element (single PE), and a simple and effective address generator are used to achieve length scalability, high performance, and low power consumption in split-radix-2/4 FFT or IFFT module.
Abstract: A digital signal processor structure by performing length-scalable Fast Fourier Transformation (FFT) discloses a single processor element (single PE), and a simple and effective address generator are used to achieve length-scalable, high performance, and low power consumption in split-radix-2/4 FFT or IFFT module In order to meet different communication standards, the digital signal processor structure has run-time configuration to perform for different length requirements Moreover, its execution time can fit the standards of Fast Fourier Transformation (FFT) or Inverse Fast Fourier Transformation (IFFT)

25 citations


Patent
21 Jun 2004
TL;DR: A single-path delay feedback pipelined fast Fourier transform processor comprising at least one set of triplet FFT stage means: a first stage means comprising a radix-2 butterfly, a feedback memory, and a multiplication by unity or Wnn/8 ; and a third stage mean comprising a trivial coefficient pre-multiplication, a butterfly and a complex twiddle coefficient multiplication with coefficients determined using a twiddle factor decomposition technique as mentioned in this paper.
Abstract: A single-path delay feedback pipelined fast Fourier transform processor comprising at least one set of triplet FFT stage means: a first FFT stage means comprising a radix-2 butterfly, a feedback memory, and a multiplication by unity; a second FFT stage means comprising a trivial coefficient pre-multiplication, a radix-2 butterfly, a feedback memory, and a multiplication by selectable unity or Wnn/8 ; and a third FFT stage means comprising a trivial coefficient pre-multiplication, a butterfly, a feedback memory, and a complex twiddle coefficient multiplication with coefficients determined using a twiddle factor decomposition technique.

20 citations


Patent
02 Nov 2004
TL;DR: In this paper, a modular pipeline algorithm and architecture for computing discrete Fourier transforms is described, where two pipeline N point {square root}{square root over (N)} point fast Fourier transform modules are combined with a center element.
Abstract: A modular pipeline algorithm and architecture for computing discrete Fourier transforms is described. For an N point transform, two pipeline N point {square root}{square root over (N)} point fast Fourier transform modules are combined with a center element. The center element contains memories, multipliers and control logic. Compared with standard N point pipeline FFT, the modular pipeline FFT maintains the bandwidth existing pipeline FFTs with reduced dynamic power consumption and reduced complexity of the overall hardware pipeline.

19 citations


Journal Article
TL;DR: A fast Fourier transform algorithm for the production of the permutation factor circulant matrices of order n based on the fast Fouriers transform (FFT) was presented, and arithmetric complexity is O(nlog_2n).
Abstract: A fast Fourier transform algorithm for the production of the permutation factor circulant matrices of order n based on the fast Fourier transform(FFT) was presented, and arithmetric complexity is O(nlog_2n).

Proceedings ArticleDOI
01 Jan 2004
TL;DR: Both the theory and the simulations show that, pro weighted zoom FFT method has lower computational complexity, less memory need and negligible error, and can meet the need of real-time processing.
Abstract: This paper addresses the problem of fast computation of the ambiguity function using a new method based on pre-weighted zoom FFT algorithm, which employs zoom FFT technique and performs the weighting process previously and thus gets ride of the extra computation. The computational complexity of the presented algorithm is compared with other methods and the simulation results are given. Both the theory and the simulations show that, pro weighted zoom FFT method has lower computational complexity, less memory need and negligible error, and can meet the need of real-time processing.

Journal ArticleDOI
TL;DR: The technique is capable of reducing the memory requirement by a factor of 6/spl sim/16 depending on the number of modes used and the spatial distribution of scatterers and is simple to implement in an existing FFT T-matrix code.
Abstract: We present a memory-reduction technique for the fast Fourier transformation (FFT) T-matrix method. The technique exploits the configuration- and Fourier-space symmetry relations of the transverse spherical multipole translation coefficients whose storage drives the memory requirement. The technique is capable of reducing the memory requirement by a factor of 6/spl sim/16 depending on the number of modes used and the spatial distribution of scatterers and is simple to implement in an existing FFT T-matrix code. We establish its accuracy and effectiveness by applying the technique to compute the RCS of aggregates of dielectric spheres.

Proceedings ArticleDOI
01 Nov 2004
TL;DR: This paper proposes a new efficient FFT architecture with structured pipeline for OFDM systems, based on radix-2/sup 4/ algorithm, which achieved above 60% area reduction when compared with the conventional programmable multiplier.
Abstract: This paper proposes a new efficient FFT architecture with structured pipeline for OFDM systems, based on radix-2/sup 4/ algorithm. The pipeline architecture with the new algorithm has the same number of multipliers as that of the radix-2/sup 2/ algorithm. However, the multiplier complexity could be reduced by an amount of above 30% by means of replacing a half of programmable multipliers with the newly proposed constant multipliers. A newly proposed complex constant multiplier can enhance the area/power efficiency of the design. From synthesis simulations of a standard 0.35/spl mu/m CMOS process, it achieved above 60% area reduction when compared with the conventional programmable multiplier.

Proceedings ArticleDOI
06 Dec 2004
TL;DR: An efficiently pipelined radix-2 FFT architecture, which doubles the throughput with significant hardware reduction, and the utilization rate of multipliers and the processing elements reach 100%.
Abstract: A high throughput fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) processor for double-rate wireless LAN, based on double-rate OFDM communication systems, is proposed. It is an efficiently pipelined radix-2 FFT architecture, which doubles the throughput with significant hardware reduction. The utilization rate of multipliers and the processing elements reach 100%. The core size is 10 mm/sup 2/ with a power consumption of 208 mW at 20 MHz for data inputs with 15-bit word length, using 0.35 /spl mu/m IP4M CMOS technology.

Proceedings ArticleDOI
01 Aug 2004
TL;DR: It is shown that the discrete triangle transform has, like the type III DCT, a Cooley-Tukey FFT type fast algorithm and an upper bound for the number of complex operations it requires.
Abstract: The discrete triangle transform (DTT) was recently introduced (Pu/spl uml/schel, M and Ro/spl uml/tteler, M, Proc ICASSP, 2004) as an example of a non-separable transform for signal processing on a two-dimensional triangular grid The DTT is built from Chebyshev polynomials in two variables in the same way as the DCT, type III, is built from Chebyshev polynomials in one variable We show that, as a consequence, the DTT has, like the type III DCT, a Cooley-Tukey FFT type fast algorithm We derive this algorithm and an upper bound for the number of complex operations it requires Similar to most separable two-dimensional transforms, the operations count of this algorithm is O(n/sup 2/ log(n)) for an input of size n/spl times/n

Journal ArticleDOI
TL;DR: It is demonstrated that FFTM is an accurate method, and is generally more accurate than FMM for a given order of multipole expansion (up to the second order), implying that F FTM is as efficient as FMM.
Abstract: In this paper, we propose a new fast algorithm for solving large problems using the boundary element method (BEM). Like the fast multipole method (FMM), the speed-up in the solution of the BEM arises from the rapid evaluations of the dense matrix–vector products required in iterative solution methods. This fast algorithm, which we refer to as fast Fourier transform on multipoles (FFTM), uses the fast Fourier transform (FFT) to rapidly evaluate the discrete convolutions in potential calculations via multipole expansions. It is demonstrated that FFTM is an accurate method, and is generally more accurate than FMM for a given order of multipole expansion (up to the second order). It is also shown that the algorithm has approximately linear growth in the computational complexity, implying that FFTM is as efficient as FMM. Copyright © 2004 John Wiley & Sons, Ltd.

Proceedings ArticleDOI
02 May 2004
TL;DR: An improved radix-16 decimation-in-frequency (DIF) FFT algorithm is proposed by introducing new indices for some of the output sub-sequences resulting from the conventional radix -16 DIF decomposition of the DFT.
Abstract: An improved radix-16 decimation-in-frequency (DIF) FFT algorithm is proposed by introducing new indices for some of the output sub-sequences resulting from the conventional radix-16 DIF decomposition of the DFT. This improved radix-16 DIF FFT algorithm achieves savings of more than 46% in the number of twiddle factor evaluations or accesses to the lookup table and address generations compared to the conventional radix-16 DIF FFT algorithm. These savings are achieved without imposing any additional computational or structural complexity in the algorithm.

Patent
28 Dec 2004
TL;DR: In this article, a Fast Fourier Transform (FFT) processor is provided, which analyzes the input/output order of the fast Fourier transformation, separates the portions requiring complex computations, simplifies the hardware thereof and adjusts the output order.
Abstract: A Fast Fourier Transform (FFT) processor is provided. It comprises a multiplexer, a first angle rotator, a second angle rotation and multiplexing unit, an adder, a twiddle factor storage, a multiplier, and a data storage. The FFT processor analyzes the input/output order of the Fast Fourier Transformation, separates the portions requiring complex computations, simplifies the hardware thereof, and adjusts the output order. It not only effectively saves the hardware area, but also reduces the computations and memory access count. Thereby, the power consumption is reduced.

Patent
01 Dec 2004
TL;DR: In this paper, a system for efficiently filtering interfering signals in a front end of a GPS receiver is disclosed, where at least a portion of the interfering signals are removed by applying weights to the inputs.
Abstract: A system for efficiently filtering interfering signals in a front end of a GPS receiver is disclosed. Such interfering signals can emanate from friendly, as well as unfriendly, sources. One embodiment includes a GPS receiver with a space-time adaptive processing (STAP) filter. At least a portion of the interfering signals are removed by applying weights to the inputs. One embodiment adaptively calculates and applies the weights by Fourier Transform convolution and Fourier Transform correlation. The Fourier Transform can be computed via a Fast Fourier Transform (FFT). This approach advantageously reduces computational complexity to practical levels. Another embodiment utilizes redundancy in the covariance matrix to further reduce computational complexity. In another embodiment, an improved FFT and an improved Inverse FFT further reduce computational complexity and improve speed. Advantageously, embodiments can efficiently null a relatively large number of jammers at a relatively low cost and with relatively low operating power.

Proceedings ArticleDOI
12 May 2004
TL;DR: It is shown that a limited set of output discrete cosine transform (DCT) samples can be computed by a modified real-valued output-pruned FFT algorithm for appropriately permuted data samples.
Abstract: In the paper it is shown that a limited set of output discrete cosine transform (DCT) samples can be computed by a modified real-valued output-pruned FFT algorithm for appropriately permuted data samples. The same is true for the discrete sine transform (DST). Analogously, when computing data contribution from few DCT or DST samples the input-pruned FFT algorithm for inverse FFT can be applied, the input-pruned algorithms for the inverse DCT or DST are obtained. The algorithms are very efficient, their complexities are O(NlogK), where N is the transform size, and K is a divisor of N equal to or greater than the number of computed transform samples, which is less than the number of computed transform samples, which is less than O(NlogN) for the full DCT or DST algorithm. The algorithms are easy to implement, too.

Proceedings ArticleDOI
24 Jun 2004
TL;DR: Two forms of optimisation; input data optimisation and FFT coefficients optimisation are investigated in this paper and the word length is optimised down to 10 bits for input data and 8 bits for the FFT coefficient.
Abstract: This paper describes the optimisation of the word length in a 16-point radix-4 reconfigurable pipelined fast Fourier transform (FFT) based receiver device. Two forms of optimisation; input data optimisation and FFT coefficients optimisation are investigated in this paper. The word length for input data and FFT coefficients are initially set to 16-bits. A genetic algorithm (GA) is then used to find the optimal word length for the input data and FFT coefficients while satisfying functionality constraints. The GA is able to determine an optimised word length down to 10 bits for input data and 8 bits for the FFT coefficients.

Proceedings ArticleDOI
13 Dec 2004
TL;DR: This paper proposes a hand-coded assembly implementation for the radix-2 DIF FFT algorithm with the twiddle-factor-based butterfly grouping method on a TI TMS320C64/spl times/ DSP that is 8 times faster than the C implementation and slightly slower than the TI assembly benchmark while requiring only 50% of memory references due to twiddle factors.
Abstract: The memory reference in digital signal processors (DSP) is among the most costly of operations due to its long latency and substantial power consumption Previously proposed twiddle-factor-based butterfly grouping methods can effectively minimize memory references due to twiddle factors for implementing any existing fast Fourier transform (FFT) algorithms on DSP However, the performance of its C implementation on DSP is far behind the corresponding TI assembly benchmark for radix-2 DIF FFT due to limitations of the compiler In this paper, we propose a hand-coded assembly implementation for the radix-2 DIF FFT algorithm with the twiddle-factor-based butterfly grouping method on a TI TMS320C64/spl times/ DSP Experimental results show that for 1024-pt radix-2 DIF FFT, our hand-coded assembly implementation is 8 times faster than the C implementation and slightly faster than the TI assembly benchmark while requiring only 50% of memory references due to twiddle factors compared to the TI assembly benchmark

Proceedings ArticleDOI
02 May 2004
TL;DR: It is shown that the proposed algorithm reduces the computational complexity significantly in comparison to the existing 3D vector radix FFT algorithms as well as algorithms that are based on row-column decomposition.
Abstract: We propose a 3D split vector-radix decimation-in-frequency (DIF) FFT algorithm for computing the 3D DFT, based on a mixture of radix-(2/spl times/2/spl times/2) and radix-(4/spl times/4/spl times/4) index maps. It is shown that the proposed algorithm reduces the computational complexity significantly in comparison to the existing 3D vector radix FFT algorithms as well as algorithms that are based on row-column decomposition. In addition, since the proposed algorithm is expressed in a simple matrix form using the Kronecker product, it facilitates easy software or hardware implementation of the algorithm.

Journal Article
TL;DR: A parallel architecture for the implementation of the radix 4 and mixed radix FFT algorithm is presented and the dedicated parallel memory mapping algorithm with the feature of minimal memory size relies on the in place calculation property of the FFT algorithms.
Abstract: A parallel architecture for the implementation of the radix 4 and mixed radix FFT algorithm is presented The dedicated parallel memory mapping algorithm with the feature of minimal memory size relies on the in place calculation property of the FFT algorithm, and can simultaneously access to all the data needed for calculation of each butterfly The address generation of twiddle factors only need simple operation in this algorithm The hardware complexity of the butterfly processor is reduced by using 3 real multipliers algorithm for a complex multiplier The processor can be configured for transforms of lengths N , where N is power of two The implementation is on an Altera chip EP200K400E using Altera Quartus II 2 0 Operating at 89MHz clock frequency the processor computes a complex 1024 point FFT within 14 1μs and 4096 point FFT within 67μs

Journal ArticleDOI
01 Feb 2004
TL;DR: This paper discusses architecture-specific performance tuning for fast Fourier transforms (FFTs) implemented in the UHFFT library, an adaptive and portable software library for FFTs developed by the authors.
Abstract: In this paper we discuss architecture-specific performance tuning for fast Fourier transforms (FFTs) implemented in the UHFFT library. The UHFFT library is an adaptive and portable software library for FFTs developed by the authors. We present the optimization methods used at different levels, starting with the algorithm selection used for the library code generation and ending with the actual implementation and specification of the appropriate compiler optimization options. We report on the performance results for several modern microprocessor architectures.

Book ChapterDOI
06 Jun 2004
TL;DR: The numerical algorithm by use of SEL is improved with FFT and the calculation speed is faster than the previous one and the limit function of approximate solutions satisfied the diffraction problem in the sense of distribution.
Abstract: A direct solver for diffraction problems is presented in this paper. The solver is based on the fast Fourier transform (FFT) and the successive elimination of lines which we call SEL. In the previous paper, we showed the numerical algorithm by use of SEL and proved that the limit function of approximate solutions satisfied the diffraction problem in the sense of distribution. In this paper, the above numerical algorithm is improved with FFT and we show that the calculation speed is faster than the previous one.

Journal ArticleDOI
TL;DR: This letter presents an efficient split vector-radix-2/8 fast Fourier transform (FFT) algorithm that saves 14% real multiplications and has much lower arithmetic complexity than the split vectors- Radix- 2/4 FFT algorithm.
Abstract: This letter presents an efficient split vector-radix-2/8 fast Fourier transform (FFT) algorithm. The split vector-radix-2/8 FFT algorithm saves 14% real multiplications and has much lower arithmetic complexity than the split vector-radix-2/4 FFT algorithm. Moreover, this algorithm reduces 25% data loads and stores compared with the split vector-radix-2/4 FFT algorithm.

Proceedings ArticleDOI
23 May 2004
TL;DR: The effect of the signal round-off errors on the accuracies of the multiplier-less fast Fourier transform-like transformation (ML-FFT) is studied.
Abstract: This paper studies the effect of the signal round-off errors on the accuracies of the multiplier-less fast Fourier transform-like transformation (ML-FFT). The idea of the ML-FFT is to parameterize the twiddle factors in the conventional FFT algorithm as certain rotation-like matrices and approximate the associated parameters inside these matrices by the sum-of-power-of-two (SOPOT) or canonical signed digits (CSD) representations. The error due to the SOPOT approximations is called the coefficient round-off error. Apart from this error, signal round-off error also occurs because of insufficient wordlengths. Using a recursive noise model of these errors, the minimum hardware to realize the ML-FFT subject to the prescribed output bit accuracy can be obtained using a random search algorithm. A design example is given to demonstrate the effectiveness of the proposed approach.