scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 2000"


Journal ArticleDOI
28 Aug 2000-Wear
TL;DR: In this article, Discrete convolution and FFT (DC-FFT) is adopted instead of the method of continuous convolutions and Fourier transform for the contact problems.

613 citations


Proceedings ArticleDOI
08 Aug 2000
TL;DR: A new FFT pruning algorithm where the number of nonzero inputs or desired outputs can be arbitrary, and the implementation is similar to the FFT algorithms that use in-place computation, with a small alteration.
Abstract: The efficiency of the fast Fourier transform may be increased by removing operations on input values which are zero, and on output values which are not required; this procedure is known as FFT pruning algorithm. Up to now some algorithms have been proposed considering decimation-in-time (DIT) or decimation-in-frequency (DIF) procedures, and considering that for a N = 2/sup M/ input points of the FFT only quantities equals to 2/sup k/ (to an integer k), of nonzero input or desired output points are required. In this paper we propose a new FFT pruning algorithm where the number of nonzero inputs or desired outputs can be arbitrary. The idea of the proposed algorithm works well with DIT as well as DIEF procedures, and the implementation is similar to the FFT algorithms that use in-place computation, with a small alteration.

49 citations


Proceedings ArticleDOI
28 May 2000
TL;DR: An efficient implementation of the Continuous Flow 2N point Real to Complex FFT based on the Radix-2 version of Cooley-Tukey algorithm that allows minimizing the total memory requirement and a scalable FFT/IFFT.
Abstract: In this paper, an efficient implementation of the Continuous Flow 2N point Real to Complex FFT is presented. The computation is based on the Radix-2 version of Cooley-Tukey algorithm. The key feature of this implementation is the alternation between DIF (Decimation In Frequency) and DIT (Decimation In Time) in the computation of FFT and IFFT of successive symbols. It allows minimizing the total memory requirement. This method requires only 2*N complex memory locations to perform a 2*N point Real-to-Complex FFT of a continuous data flow when other current methods need 3*N or more. The Real to Complex FFT is computed in two steps: a Complex to Complex FFT then Post-Processing. The Complex to Real IFFT is also computed in two steps: Pre-Processing then a Complex to Complex IFFT. 'Cycle Stealing' allows sharing the clock cycles and the data memory banks between the Complex to Complex FFT/IFFT and the Post/Pre-Processing. Only four memory banks and two physical cells (Butterflies) are used to compute an FFT of up to 8192 real input samples with a computation speed twice as fast as the input data rate. This implementation allows a scalable FFT/IFFT: the same hardware resources are used for different FFT sizes 2*N=2" where (1/spl les/n/spl les/13).

47 citations


Journal ArticleDOI
TL;DR: Although the proposed algorithm does not reach the theoretical lower bound for the number of multiplications, the algorithm possesses the regular structure of the Cooley-Tukey FFT algorithms, therefore, the FFT implementation principles can also be applied to the discrete cosine transform.
Abstract: Modification to the architecture-oriented fast algorithm for discrete cosine transform of type II from Astola and Akopian (see ibid., vol.47, no.4, p.1109-24, April 1999) is presented, which results in a constant geometry algorithm with simplified parameterized node structure. Although the proposed algorithm does not reach the theoretical lower bound for the number of multiplications, the algorithm possesses the regular structure of the Cooley-Tukey FFT algorithms. Therefore, the FFT implementation principles can also be applied to the discrete cosine transform.

39 citations


Proceedings ArticleDOI
28 May 2000
TL;DR: In this paper, an implementation method for a single-chip 2048 complex point FFT in terms of sequential data processing is proposed and the convergent block floating point (CBFP) algorithm is used for the effective internal bit rounding.
Abstract: In this paper, we propose an implementation method for a single-chip 2048 complex point FFT in terms of sequential data processing. In order to reduce the required chip area for the sequential processing of 2 K complex data, a DRAM-like pipelined commutator architecture is used. The 16-point FFT is a basic building block of the entire FFT chip, and the 2048-point FFT consists of cascaded blocks with five stages of radix-4 and one stage of radix-2. Since each stage requires rounding of the resulting bits while maintaining the proper S/N ratio, the convergent block floating point (CBFP) algorithm is used for the effective internal bit rounding.

30 citations



Journal ArticleDOI
Ralf Hinze1
TL;DR: An efficient iterative version of the FFT algorithm performs as a first step a bit-reversal permutation of the input list that swaps elements whose indices have binary representations that are the reverse of each other.
Abstract: One well known algorithm is the Fast Fourier Transform (FFT). An efficient iterative version of the FFT algorithm performs as a first step a bit-reversal permutation of the input list. The bit-reversal permutation swaps elements whose indices have binary representations that are the reverse of each other. Using an amortized approach, this operation can be made to run in linear time on a random-access machine. An intriguing question is whether a linear-time implementation is also feasible on a pointer machine, that is, in a purely functional setting. We show that the answer to this question is in the affirmative. In deriving a solution, we employ several advanced programming language concepts such as nested datatypes, associated fold and unfold operators, rank-2 types and polymorphic recursion.

28 citations


Proceedings ArticleDOI
13 Sep 2000
TL;DR: This paper designs a family of FFT processors, parameterized by the number of points, the dimension, the numberof processors, and the internal dataflow, and shows how to map different dimensionless FFTs onto this hardware design.
Abstract: There exist Fast Fourier transform (FFT) algorithms, called dimensionless FFTs, that work independent of dimension. These algorithms can be configured to compute different dimensional DFTs simply by relabeling the input data and by changing the values of the twiddle factors occurring in the butterfly operations. This observation allows us to design an FFT processor, which with minor reconfiguring, can compute one, two, and three dimensional DFTs. In this paper we design a family of FFT processors, parameterized by the number of points, the dimension, the number of processors, and the internal dataflow, and show how to map different dimensionless FFTs onto this hardware design. Different dimensionless FFTs have different dataflows and consequently lead to different performance characteristics. Using a performance model we search for the optimal algorithm for the family of processors we considered. The resulting algorithm and corresponding hardware design was implemented using FPGA.

25 citations


Patent
01 Aug 2000
TL;DR: In this article, a method for computing an out-of-place FFT in which each stage of the FFT has an identical signal flow geometry is presented, where the group loop has been eliminated, the twiddle factor data is stored in bit-reversed manner, and the output data values are stored with a unity stride.
Abstract: A method for computing an out of place FFT in which each stage of the FFT has an identical signal flow geometry. In each stage of the presently disclosed FFT method the group loop has been eliminated, the twiddle factor data is stored in bit-reversed manner, and the output data values are stored with a unity stride.

25 citations


Proceedings ArticleDOI
21 Aug 2000
TL;DR: A new and efficient algorithm for the computation of filter banks that performs the filtering in frequency domain to utilize the advantage of FFT by combining FFT with decimation filter.
Abstract: This paper proposes a new and efficient algorithm for the computation of filter banks. The algorithm performs the filtering in frequency domain to utilize the advantage of FFT. By combining FFT with decimation filter, more than 80% computation power can be saved.

22 citations


Journal ArticleDOI
TL;DR: A radix-7, decimation-in-space fast Fourier transform (FFT) for images defined on hexagonal aggregates, expressed in terms of the p-product, a generalization of matrix multiplication.
Abstract: Hexagonal aggregates are hierarchical arrangements of hexagonal cells These hexagonal cells may be efficiently addressed using a scheme known as generalized balanced ternary for dimension 2, or GBT_2 The objects of interest in this paper are digital images whose domains are hexagonal aggregates We define a discrete Fourier transform (DFT) for such images The main result of this paper is a radix-7, decimation-in-space fast Fourier transform (FFT) for images defined on hexagonal aggregates The algorithm has complexity N log_7 N It is expressed in terms of the p-product, a generalization of matrix multiplication Data reordering (also known as shuffle permutations) is generally associated with FFT algorithms However, use of the p-product makes data reordering unnecessary

Proceedings ArticleDOI
14 May 2000
TL;DR: This work uses the four-step and five-step algorithms to implement the parallel one-dimensional FFT algorithms and achieves high-performance performance results on a distributed memory parallel computer with (pseudo) vector SMP nodes, HITACHI SR8000.
Abstract: We propose high-performance parallel one-dimensional fast Fourier transform (FFT) algorithms for distributed memory parallel computers with vector symmetric multiprocessor (SMP) nodes. The four-step FFT algorithm can be altered into a five-step FFT algorithm to expand the innermost loop length. We use the four-step and five-step algorithms to implement the parallel one-dimensional FFT algorithms. In our proposed parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order. Performance results of one-dimensional power-of-two FFTs on a distributed memory parallel computer with (pseudo) vector SMP nodes, HITACHI SR8000, are reported. We succeeded in obtaining performance of about 38 GFLOPS on a 16-node SR8000.

Journal ArticleDOI
TL;DR: A method is presented for converting the m-dimensional discrete Fourier transform (MD-DFT) into a number of one-dimensional DFTs (1D-DFTs) by rearranging the order of the input sequence, which results in a considerable saving in the number of addition operations.
Abstract: A method is presented for converting the m-dimensional discrete Fourier transform (MD-DFT) into a number of one-dimensional DFTs (1D-DFTs) by rearranging the order of the input sequence. The result of this conversion is that the number of multiplications for computing an m-dimensional DFT is only 1/m times that of the usually used row-column DFT algorithm. To reduce the number of additions, a multidimensional polynomial transform (MD-PT) is then used and a considerable saving in the number of addition operations is also achieved.

Journal ArticleDOI
TL;DR: The use of wavelets for the solution of convolution equations is studied as a possible alternative to the well-established Fast Fourier Transform technique and of so-called vaguelettes for the representations of the given data leads to an algorithm which is even faster than FFT.
Abstract: The use of wavelets for the solution of convolution equations is studied as a possible alternative to the well-established Fast Fourier Transform (FFT) technique. Two possible solution strategies are investigated: (1) The use of wavelets for the representation of both the given data and the unknown solution. This leads to an algorithm with good de-noising and data-compression properties. In terms of computational efficiency this algorithm is inferior to FFT. (2) The use of wavelets for the representation of the unknown solution and of so-called vaguelettes for the representations of the given data. This leads to an algorithm which is even faster than FFT.

Proceedings ArticleDOI
21 Aug 2000
TL;DR: A new approach for computing DFT of arbitrary length is proposed, which is based on the arithmetic Fourier transform (AFT), which needs only /spl Oscr/(N) multiplications and has a simple computational structure, so it can be easily performed in parallel and it is very suitable for VLSI design.
Abstract: A new approach for computing DFT of arbitrary length is proposed, which is based on the arithmetic Fourier transform (AFT). The algorithm needs only /spl Oscr/(N) multiplications and has a simple computational structure, so it can be easily performed in parallel and it is very suitable for VLSI design. The algorithm is faster than the classical FFT when the length of the DFT contains relatively large factors. It is especially efficient for computing the DFT of prime length, where FFT does not work. The algorithm is competitive with the FFT in term of accuracy. A method to enhance the accuracy of the algorithm is also proposed for cases when higher accuracy is required.

Journal ArticleDOI
TL;DR: This work proposes effective implementations in the case of multi-dimensional radix-2 FFT for the recent RISC workstation and the vector-type supercomputer, respectively.

Journal ArticleDOI
TL;DR: Two new numerical algorithms based on the Fast Fourier Transform techniques (FFT) are used to solve the structured robustness analysis problem in the case of one parameter entering polynomially (Barmish, 1994).

Proceedings ArticleDOI
05 Jun 2000
TL;DR: The techniques of Goedecker's (1997) techniques are used to obtain an algorithm for computing radix-6 FFT with fewer floating-point instructions than conventional radIX- 6 FFT algorithms.
Abstract: A new radix-6 FFT algorithm suitable for multiply-add instruction is proposed. The new radix-6 FFT algorithm requires fewer floating-point instructions than the conventional radix-6 FFT algorithms on processors that have a multiply-add instruction. We use Goedecker's (1997) techniques to obtain an algorithm for computing radix-6 FFT with fewer floating-point instructions than conventional radix-6 FFT algorithms. The number of floating-point instructions for the new radix-6 FFT algorithm is compared with those of conventional radix-6 FFT algorithms on processors with multiply-add instruction.

Book ChapterDOI
18 Jun 2000
TL;DR: A high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of vector symmetric multiprocessor (SMP) nodes that can be altered into a multirow FFT algorithm to expand the innermost loop length is proposed.
Abstract: In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of vector symmetric multiprocessor (SMP) nodes. The three-dimensional FFT algorithm can be altered into a multirow FFT algorithm to expand the innermost loop length. We use the multirow FFT algorithm to implement the parallel three-dimensional FFT algorithm. Performance results of three-dimensional power-of-two FFTs on clusters of (pseudo) vector SMP nodes, Hitachi SR8000, are reported. We succeeded in obtaining performance of about 40 GFLOPS on a 16-node Hitachi SR8000.

Proceedings ArticleDOI
18 Dec 2000
TL;DR: A protected system that computes the FFT, truncates small coefficients and compresses the remaining nonzero coefficients using lossless arithmetic coding is described, which achieves end-to-end error detection.
Abstract: Transform coefficients carry important data characteristics but can also be compressed significantly in many remote sensing applications. Failures in the several computing facilities that execute lossy compression algorithms and support the transmission of Fourier transform data can corrupt the values beyond recovery at the final destination. Various methods for including fault tolerance at the data processing level are exemplified by describing a protected system that computes the FFT, truncates small coefficients and compresses the remaining nonzero coefficients using lossless arithmetic coding. Algorithmic checks within the FFT and arithmetic encoding and decoding operations are augmented with additional features between and across several subsystems involved in compressing and transmitting the FFT data. End-to-end error detection is achieved in this manner.

Proceedings ArticleDOI
29 Oct 2000
TL;DR: The proposed architecture is based on a new index mapping scheme which has two levels of decomposition, and can be efficiently used to realize a pipelined implementation of the N/sup m/-point m-dimensional DFT simply by omitting some of the twiddle factor ROMs.
Abstract: This paper presents a new pipelined architecture for the N/sup m/-point FFT (Fast Fourier Transform). Unlike conventional pipelined architectures, which are based on the DIT (Decimation in Time) or DIF (Decimation in Frequency) algorithms, the proposed architecture is based on a new index mapping scheme which has two levels of decomposition. As a result, the new architecture can be efficiently used to realize a pipelined implementation of the N/sup m/-point m-dimensional DFT simply by omitting some of the twiddle factor ROMs.

Journal ArticleDOI
TL;DR: This work proposes an original multidimensional fast Fourier transform (FFT) algorithm where the computation is first organized into multiplier-free butterflies and then completed by 1-D FFTs, finding that its total computational cost decreases as the signal space dimensions increase and its efficiency is superior to that of any other multiddimensional FFT algorithm.
Abstract: This work proposes an original multidimensional fast Fourier transform (FFT) algorithm where the computation is first organized into multiplier-free butterflies and then completed by 1-D FFTs. The properties of well-known 1-D FFT algorithms blend in quite nicely with those of the proposed multidimensional FFT scheme, extending their computational and structural characteristics to it. Strong points of the proposed method are that its total computational cost decreases as the signal space dimensions increase and that its efficiency is superior to that of any other multidimensional FFT algorithm.

Journal Article
TL;DR: The precedures of DCT (discrete cosine transform) and FFT (fast Fourier transform) which map integers to integers by using lifting scheme and the butterfly configuration of FFT are described.
Abstract: In this paper, the authors describe the precedures of DCT (discrete cosine transform) and FFT (fast Fourier transform) which map integers to integers by using lifting scheme and the butterfly configuration of FFT. The transform is reversible, fast and suitable for the lossless image compressions.

Book ChapterDOI
10 Sep 2000
TL;DR: The main idea of the paper is that fast algorithms, like FFT, can be made more efficient in the context of an algebra, rather than in the more singular quaternion or complex algebras structure.
Abstract: The main idea of the paper is that fast algorithms, like FFT, can be made more efficient in the context of an algebra, rather than in the more singular quaternion or complex algebras structure. However, the complex algebra structure can then be recovered as a projection from the larger algebra in which it is embedded. Namely, the 12-dimensional algebra (hurwitzion algebra) having the basis elements associated with the integer Hurwitz quaternions is introduced. The computational aspects of the hurwitzion arithmetic are considered. The overlapped fast algorithms of two-dimensional discrete Fourier transform of an RGB image are also developed.

Proceedings ArticleDOI
16 Jul 2000
TL;DR: An FFT algorithm for piecewise smooth functions by using a double interpolation procedure and Gaussian quadrature is developed, based on the method of Sorets (1995).
Abstract: In this paper, based on the method of Sorets (1995), we develop an FFT algorithm for piecewise smooth functions by using a double interpolation procedure. With the help of the double interpolation and Gaussian quadrature, the algorithm can be applied to both uniformly and nonuniformly sampled data. The formulation of this algorithm is developed, followed by the implementation procedures and complexity analysis. Finally, we show the numerical results to demonstrate the performance of the algorithm.

01 Jan 2000
TL;DR: In this article, a modified spherical 2D FFT formula for the computation of geoid undulations has been developed in order to reduce the impact of the second type of approximation error, which is the meridian convergence at higher latitudes.
Abstract: The fast Fourier transform (FFT) technique is a very powerful tool for the efficient evaluation of gravity field convolution integrals. At present, there exist three types of convolution formulae in use, i.e. the planar 2D convolution, the spherical 2D convolution and the spherical 1D convolution. As we know, the largest drawback of both the planar and the spherical 2D FFT methods is that, due to the approximations in the kernel function, only non exact results can be achieved. Apparently, the reason is the meridian convergence at higher latitudes. As the meridians converge, the Δ φ , Δ λ blocks don't form a rectangular grid, as is assumpted in 2D FFT methods. It should be pointed out that the meridian convergence not only leads to an approximation error in the kernel function, but also causes a approximation error during the implementation of 2D FFT in computer. In order to reduce the impact of the second type of approximation error, a modified spherical 2D FFT formula for the computation of geoid undulations has been developed in this paper. A series of numerical tests have been carried out to illustrate the improvement made upon the old spherical 2D FFT. The second part of this paper is to discuss the influences of a spherical harmonic reference field, a limited cap size and a modified Stokes kernel on geoid computation. The geoid results over China by applying different modified Stokes kernel with different integration radii have been compared to GPS leveling and altimeter measured geoidal undulations to obtain a set of optimum geoid computation parameters.

01 Jan 2000
TL;DR: A new Fourier analysis technique called the arithmetic Fourier transform (AFT) is used to compute DFT, which needs only O(N) multiplications and opens up a new approach for the fast computation of DFT.
Abstract: The Discrete Fourier Transform (DFT) plays an important role in digital signal processing and many other fields.In this paper,a new Fourier analysis technique called the arithmetic Fourier transform (AFT) is used to compute DFT.This algorithm needs only O(N) multiplications.The process of the algorithm is simple and it has a unified formula,which overcomes the disadvantage of the traditional fast method that has a complex program containing too many subroutines.The algorithm can be easily performed in parallel,especially suitable for VLSI designing.For a DFT at a length that contains big prime factors,especially for a DFT at a prime length,it is faster than the traditional FFT method.The algorithm opens up a new approach for the fast computation of DFT.

Journal Article
TL;DR: This paper improves FFT algorithm into a recurrence expression, a (j) k=a (j-1) x+a ( j- 1) yω p, in transforming, where x,y and p can calculate with binary bit.
Abstract: This paper improves FFT algorithm into a recurrence expression, a (j) k=a (j-1) x+a (j-1) yω p, in transforming. Where x,y and p can calculate with binary bit.

Journal Article
TL;DR: It is show that a high radix FFT with generator γ =3 over GF( F n ) can be used for decoding of long RS codes of length 2 2 n, and is considerably faster than a decoder using the usual radix 2 FFT.
Abstract: Presents a new algorithm on transform decoding of Reed Solomon code, based on transform of Number Theory. It is show that a high radix FFT with generator γ =3 over GF( F n ) can be used for decoding of long RS codes of length 2 2 n . Such an RS decoder is considerably faster than a decoder using the usual radix 2 FFT. This technique applies most ideally to RS(255,223) code being considered currently for space and satellite communication applications.

Journal Article
TL;DR: A new hardware oriented memory accessing algorithm for pipeline radix 2×2 FFT processor is proposed, and the data and twiddle factor address generation hardware is shown to have higher speed than previous methods.
Abstract: A new hardware oriented memory accessing algorithm for pipeline radix 2×2 FFT processor is proposed. The memory assignment is “in place” to minimize memory size, and memory bank conflict free to allow simultaneous access to the 4 data needed for calculation of each of the radix 2×2 butterflies as they occur in the algorithm. Address generation for twiddle factors is also described. The data and twiddle factor address generation hardware is shown to have higher speed than previous methods.