scispace - formally typeset
Search or ask a question

Showing papers on "Prime-factor FFT algorithm published in 1997"


Journal ArticleDOI
TL;DR: A new formulation of fast Fourier transformation kernels for radix 2, 3, 4, and 5 are presented, which have a perfect balance of multiplies and adds and give higher performance on machines that have a single multiply--add (mult--add) instruction.
Abstract: We present a new formulation of fast Fourier transformation (FFT) kernels for radix 2, 3, 4, and 5, which have a perfect balance of multiplies and adds. These kernels give higher performance on machines that have a single multiply--add (mult--add) instruction. We demonstrate the superiority of this new kernel on IBM and SGI workstations.

181 citations


Journal ArticleDOI
TL;DR: Experimental comparisons show that an implementation of the new algorithm outperforms a similarly coded right-looking algorithm on six different RISC architectures, that the new algorithms performs fewer cache misses than any other algorithm tested, and that it benefits more from Strassen's matrix-multiplication algorithm.
Abstract: This paper presents a new partitioned algorithm for LU decomposition with partial pivoting. The new algorithm, called the recursively partitioned algorithm, is based on a recursive partitioning of the matrix. The paper analyzes the locality of reference in the new algorithm and the locality of reference in a known and widely used partitioned algorithm for LU decomposition called the right-looking algorithm. The analysis reveals that the new algorithm performs a factor of $\Theta(\sqrt{M/n})$ fewer I/O operations (or cache misses) than the right-looking algorithm, where $n$ is the order of the matrix and $M$ is the size of primary memory. The analysis also determines the optimal block size for the right-looking algorithm. Experimental comparisons between the new algorithm and the right-looking algorithm show that an implementation of the new algorithm outperforms a similarly coded right-looking algorithm on six different RISC architectures, that the new algorithm performs fewer cache misses than any other algorithm tested, and that it benefits more from Strassen's matrix-multiplication algorithm.

169 citations


Journal ArticleDOI
TL;DR: A simple procedure for designing finite-extent impulse response (FIR) discrete-time filters using the FFT algorithm is described and extension of the design method to higher dimensions is straightforward.
Abstract: The fast Fourier transform (FFT) algorithm has been used in a variety of applications in signal and image processing. In this article, a simple procedure for designing finite-extent impulse response (FIR) discrete-time filters using the FFT algorithm is described. The zero-phase (or linear phase) FIR filter design problem is formulated to alternately satisfy the frequency domain constraints on the magnitude response bounds and time domain constraints on the impulse response support. The design scheme is iterative in which each iteration requires two FFT computations. The resultant filter is an equiripple approximation to the desired frequency response. The main advantage of the FFT-based design method is its implementational simplicity and versatility. Furthermore, the way the algorithm works is intuitive and any additional constraint can be incorporated in the iterations, as long as the convexity property of the overall operations is preserved. In one-dimensional cases, the most widely used equiripple FIR filter design algorithm is the Parks-McClellan algorithm (1972). This algorithm is based on linear programming, and it is computationally efficient. However, it cannot be generalized to higher dimensions. Extension of our design method to higher dimensions is straightforward. In this case two multidimensional FFT computations are needed in each iteration.

108 citations


Journal ArticleDOI
TL;DR: A new subspace algorithm for the identification of multi-input multi-output linear discrete time systems from measured power spectrum data is presented, showing how the state space system matrices can be determined by taking the inverse discrete Fourier transform of the given data and applying the result to a new realization algorithm.

68 citations


Proceedings ArticleDOI
19 May 1997
TL;DR: The design of a custom CMOS Fast Fourier Transform (FFT) processor for computing 256-point complex FFT, well suited for real-time spectrum analysis in instrumentation and measurement applications is described.
Abstract: This paper describes in detail the design of a custom CMOS Fast Fourier Transform (FFT) processor for computing 256-point complex FFT. The FFT is well suited for real-time spectrum analysis in instrumentation and measurement applications. The FFT butterfly processor consists of one parallel-parallel multiplier and two adders. It is capable of computing one butterfly computation every 100 ns thus it can compute 256-complex point FFT in 25.6 /spl mu/s excluding data input and output processes.

35 citations


Proceedings ArticleDOI
H. Guo1, C.S. Burrus1
21 Apr 1997
TL;DR: An algorithm that uses the discrete wavelet transform as a tool to compute the discrete Fourier transform (DFT) and the Cooley-Tukey FFT is shown to be a special case of the proposed algorithm when the wavelets in use are trivial.
Abstract: We propose an algorithm that uses the discrete wavelet transform (DWT) as a tool to compute the discrete Fourier transform (DFT). The Cooley-Tukey FFT is shown to be a special case of the proposed algorithm when the wavelets in use are trivial. If no intermediate coefficients are dropped and no approximations are made, the proposed algorithm computes the exact result, and its computational complexity is on the same order of the FFT, i.e. O(N log/sub 2/ N). The main advantage of the proposed algorithm is that the good time and frequency localization of wavelets can be exploited to approximate the Fourier transform for many classes of signals resulting in much less computation. Thus the new algorithm provides an efficient complexity vs. accuracy tradeoff. When approximations are allowed, under certain sparsity conditions, the algorithm can achieve linear complexity, i.e. O(N). The proposed algorithm also has built-in noise reduction capability.

33 citations


Journal ArticleDOI
01 Oct 1997
TL;DR: The authors propose one-dimensional and two-dimensional systolic architectures for the discrete Hilbert transform that have the features of massive parallelism, high pipelining, regular data flow, modular nature and local interconnections.
Abstract: A new fast parallel array algorithm to compute the discrete Hilbert transform for radix-2 length sequences is proposed. Unlike the existing fast methods which use transforms such as the fast Fourier transform, the proposed algorithm does not require the help of any fast transforms. This array algorithm offers a suitable expression for developing a VLSI systolic array for the discrete Hilbert transform. The authors propose one-dimensional and two-dimensional systolic architectures for the discrete Hilbert transform. The proposed architectures have the features of massive parallelism, high pipelining, regular data flow, modular nature and local interconnections. These arrays offer high speed computation of the discrete Hilbert transform for real-time signal processing applications.

28 citations


Journal ArticleDOI
TL;DR: An error propagation model is proposed for the in-place decimation-in-time version of the radix-2 FFT algorithm and an accurate error expression and error variance for the computation of FFT are derived.
Abstract: An error propagation model is proposed for the in-place decimation-in-time version of the radix-2 FFT algorithm. With the model, an accurate error expression and error variance for the computation of FFT are derived. This correspondence deals with fixed-point and block floating-point arithmetic. Simulation results agree closely with the theoretical predicted ones. We find that some roundoff errors at different stages correlate with each other. The density of correlations is closely associated with the round-off approach used in butterfly calculations.

28 citations


Journal ArticleDOI
TL;DR: In this article, a fast algorithm for numerical calculation of arbitrary real order of fractional Fourier transforms is presented, which allows one to freely choose the sampling resolutions in both x and u-space under the restriction of the Nyquist sampling theorem.

22 citations


Journal ArticleDOI
01 Aug 1997
TL;DR: A new pruning method for an FFT type of transform structure is proposed, whose novelty lies in the fact that it is able to complete a previously pruned transform or to progress from one level of pruning to another.
Abstract: A new pruning method for an FFT type of transform structure is proposed. Its novelty lies in the fact that, besides being able to prune the transform, it is able to complete a previously pruned transform or to progress from one level of pruning to another. The method can be directly applied to fast progressive image coding.

14 citations


Proceedings ArticleDOI
12 Apr 1997
TL;DR: The results of this work will serve as a framework for creating an object-oriented, poly-functional FFT implementation which will automatically choose the most efficient algorithm given user-specified constraints.
Abstract: A large number of fast Fourier transform (FFT) algorithms have been developed over the years. Among these, the most promising are the radix-2, radix-4, split-radix, fast Hartley transform (FHT), quick Fourier transform (QFT), and the decimation-in-time-frequency (DITF) algorithms. We present a rigorous analysis of these algorithms that includes the number of mathematical operations, computational time, memory requirements, and object code size. The results of this work will serve as a framework for creating an object-oriented, poly-functional FFT implementation which will automatically choose the most efficient algorithm given user-specified constraints.

Journal ArticleDOI
TL;DR: In this article, a set of efficient formulas to evaluate the deflections of the vertical on the sphere using gridded data was presented, including the Vening-Meinesz formula, the topographic indirect effect on the vertical as well as the terrain corrections.
Abstract: This paper presents a set of efficient formulas to evaluate the deflections of the vertical on the sphere using gridded data. The Vening-Meinesz formula, the topographic indirect effect on the deflections of the vertical as well as the terrain corrections are expressed as both 2D and 1D convolutions on the sphere, and consequently can be evaluated by the 2D and the 1D fast Fourier transform (FFT). When compared with the results obtained from pointwise integration, the use of the 1D FFT gives identical results, and therefore these results were used as control values in this paper. The use of the spherical 2D FFT improves significantly the computational efficiency with little sacrifice of accuracy (0.6″ rms difference from the 1D FFT results). The planar 2D FFT, which is as efficient as the spherical 2D FFT, gives worse results (1.2″ rms difference from the 1D FFT results) because of the extra approximations.

Journal ArticleDOI
TL;DR: Empirical comparisons of two avenues, namely the network and fast Fourier transform (FFT) algorithms, for an exact goodness-of-fit test on a multinomial show that the network-cum-polynomial multiplication algorithm is the more efficient and accurate of the two.
Abstract: Multinomial goodness-of-fit tests arise in a diversity of milieu The long history of the problem has spawned a multitude of asymptotic tests If the sample size relative to the number of categories is small, the accuracy of these tests is compromised In that case, an exact test is a prudent option But such tests are computationally intensive and need efficient algorithms This paper gives a conceptual overview, and empirical comparisons of two avenues, namely the network and fast Fourier transform (FFT) algorithms, for an exact goodness-of-fit test on a multinomial We show that a recursive execution of a polynomial product forms the basis of both these approaches Specific details to implement the network method, and techniques to enhance the efficiency of the FFT algorithm are given Our empirical comparisons show that for exact analysis with the chi-square and likelihood ratio statistics, the network-cum-polynomial multiplication algorithm is the more efficient and accurate of the two

Patent
25 Mar 1997
TL;DR: In this paper, a system for processing a signal made up of at least one spectral component is presented. But the system is not suitable for high-dimensional data, and it requires non-complex arithmetic operations to compute the phase and amplitude of the signal.
Abstract: A system for processing a signal made up of at least one spectral component. The system includes a sampling circuit for periodically sampling the signal to obtain a sequence of discrete data samples representing the signal; and a computational circuit for computing a fast Fourier transform (FFT) of the sequence of discrete data samples to an extent which the FFT is computed using substantially exclusively non-complex arithmetic operations, and for using a resultant portion of the FFT to obtain a phase and amplitude of the at least one spectral component making up the signal.

Journal ArticleDOI
TL;DR: A fast, lossy image compression algorithm based on a slight variation of the classical 2-D fast Fourier transform (FFT), which is applied to pictures generated by classical interferometers, photographic equipment, particle tracing instruments, and IR cameras and the results are extremely encouraging.
Abstract: In the framework of European Space Agency (ESA) FluidPac satellite mission, we have developed a fast, lossy image compression algorithm (ICA) based on a slight variation of the classical 2-D fast Fourier transform (FFT). In essence, given a monochrome picture, the ICA calculates (almost) its Fourier spectrum. It then applies a low-pass filter to eliminate all Fourier coefficients beyond a certain user-defined cutoff frequency. Finally, it further compresses by encoding the surviving FFT coefficients in a more memory efficient manner. The proposed scheme works best with pictures where the high-frequency data are of little value. This is precisely the case with electronic speckle pattern interferometer (ESPI) images. The ICA low-pass filter removes the speckle noise while preserving the useful scientific information: the interference fringe pattern. We have, however, also applied the FluidPac ICA to pictures generated by classical interferometers, photographic equipment, particle tracing instruments, and IR cameras. The results are extremely encouraging.

Journal ArticleDOI
TL;DR: An efficient algorithm for the two-dimensional (2-D) arithmetic Fourier transform (AFT) based on the Mobius inversion formula of odd number series is presented, which requires fewer multiplications and has less complexity over previous algorithms.
Abstract: This article presents an efficient algorithm for the two-dimensional (2-D) arithmetic Fourier transform (AFT) based on the Mobius inversion formula of odd number series. It requires fewer multiplications and has less complexity over previous algorithms. In addition, a technique is proposed to carry out the on-axis Fourier coefficients. A parallel VLSI architecture is developed for the new algorithm.

Journal ArticleDOI
TL;DR: A certain equivalence relation is defined on the set of bijections that lead to FFT algorithms, and its connection with isomorphism classes of group extensions is studied.
Abstract: Fast Fourier transform algorithms rely upon the choice of certain bijective mappings between the indices of the data arrays. The two basic mappings used in the literature lead to Cooley–Tukey algorithms or to prime factor algorithms. But many other bijections also lead to FFT algorithms, and a complete classification of these mappings is provided. One particular choice leads to a new FFT algorithm that generalizes the prime factor algorithm. It has the advantage of reducing the floating point operation count by reducing the number of trigonometric function evaluations.

Journal ArticleDOI
TL;DR: In this article, it is demonstrated that the mathematical formulation of Abate and Dubner (1968) and the improved FFT-based Laplace inversion algorithm of Hwang et al. (1991) can be combined to give an efficient algorithm for determining accurate coefficients of a power series expansion.

Journal ArticleDOI
TL;DR: A polynomial-based approach is simpler and derives naturally from the fact that exact distributions for many discrete data models, including those reviewed, arise from polynomials, and facilitates a synthesis of diverse algorithms used in the field of exact inference.

Journal ArticleDOI
TL;DR: A general split-radix algorithm is presented to compute 2D discrete Fourier transforms of sequence length q'2/sup m/ by q' 2/Sup m/ where q is an odd integer to achieve savings in the number of operations.
Abstract: A general split-radix algorithm is presented to compute 2D discrete Fourier transforms of sequence length q'2/sup m/ by q'2/sup m/ where q is an odd integer. By setting different values of q, DFT's of various sequence lengths can be efficiently computed. When q=3 for example, savings in the number of operations can be achieved in comparison with that needed by other algorithms.

Journal ArticleDOI
TL;DR: Theorems that identify the symmetry in f[x, y] based on the depth of the quadtree to expedite 2-D FFT computation of coherent digital images are presented and applied in transform coding systems and lossy compression of images.
Abstract: The discrete Fourier transform (DFT) of a real sequence f[x, y] of size N/spl times/N, where N=2/sup n/, can be computed by a two-dimensional (2-D) FFT of size N/4, or smaller if f[x, y] is known to have certain symmetries. This paper presents theorems that identify the symmetry in f[x, y] based on the depth of the quadtree to expedite 2-D FFT computation of coherent digital images. In principle, it establishes that if the quadtree of f[x, y] has maximum depth k

01 Jan 1997
TL;DR: Time-localization can be achieved by first sliding a window along the signal and then taking the FT over each interval, and the Gabor transform, which uses a Gaussian window function, is often used.
Abstract: analysis has been the Fourier transform (FT) using the fast Fourier transform (FFT) algorithm. However, the FFT method assumes the signal to be stationary and is thereby insensitive to its varying features. Time-localization can be achieved by first sliding a window along the signal and then taking the FT over each interval. The Gabor transform (GT), which uses a Gaussian window function, is often ap

Journal ArticleDOI
TL;DR: In this article, the exchange-correlation potential in crystals is calculated using fast Fourier transform (FFT) instead of the standard FFT algorithm, which has much greater numerical accuracy than the standard method.

Proceedings ArticleDOI
19 Sep 1997
TL;DR: The implementation of 2D FFT is presented as a general procedure by row-column method and vector-radix method based on a general-purpose massively parallel processing system--DAWN1000 developed in China.
Abstract: Two-dimensional (2D) Discrete Fourier Transform (DFT) frequently needs to be performed in the digital image processing Although the computing time of 2D DFT can be dramatically reduced by using 2D Fast Fourier Transform (FFT), the processing speed of a very large array is yet intolerable The development of parallel processing system promotes the application of 2D FFT In this paper, we present the implementation of 2D FFT as a general procedure by row-column method and vector-radix method based on a general-purpose massively parallel processing system--DAWN1000 developed in China Even though the 2D FFT has parallel characteristics in nature, the requirement of corner-turning and the existence of data communication make its implementation more complicated We analyze the impact of the machine capacity and the computing complexity on the algorithm efficiency and evaluate the implementation in terms of the arithmetic operations as well as the data transfer The comparison of the two methods shows the fact that each method has its own advantages and disadvantages Combining their traits, we design a new implementation algorithm concerning its flexibility, the efficiency and the complexity of the communication As an example, we fulfill the spaceborne SAR image processing by using the new approach Keywords: Parallel Processing, MPP, Row-Column Algorithm, Vector-Radix Algorithm, Block Algorithm, TwoDimensional FFT, SAR Image Processing

Proceedings ArticleDOI
04 May 1997
TL;DR: It will be shown that complex numbers can be approximated accurately by cyclotomic integers, and combine this idea with Chinese remaindering strategies in the cyclOTomic integers to give a O(b1+?
Abstract: Many applications of fast fourier transforms (FFT's), such as computer- tomography, geophysical signal processing, high resolution imaging radars, and prediction filters, require high precision output. The usual method of fixed point computation of FFT's of vectors of length 2l leads to an average loss of l/2 bits of precision. This phenomenon, often referred to as computational noise, causes major problems for arithmetic units with limited precision which are often used for real time applications. Several researchers have noted that calculation of FFT's with algebraic integers avoids computational noise entirely, see, e.g., [3]. We will show that complex numbers can be approximated accurately by cyclotomic integers, and combine this idea with Chinese remaindering strategies in the cyclotomic integers to, roughly, give a O(b1+? L log (L)) algorithm to compute b-bit precision FFT's of length L. The first part of the paper will describe the FFT strategy, assuming good app..

Journal ArticleDOI
E.A. Hashish1
TL;DR: In this paper, the performance of the FFT inversion method is investigated for the case of band limited scattered field data and a modification of the inversion algorithm is presented to increase the accuracy of the method in this case.
Abstract: The inverse scattering techniques have become widely used in many applications. These techniques are based on either continuous profile inversion or discrete multi-layer models. The FFT inversion method has been newly introduced for the detection of discrete multi-layer models. In this paper, the performance of the FFT inversion method is investigated for the case of band limited scattered field data. A modification of the inversion algorithm is presented to increase the accuracy of the method in this case. The proposed algorithm exhibits much better accuracy than the directly applied FFT inversion method.

Proceedings ArticleDOI
17 Nov 1997
TL;DR: This paper proposes a concurrent fault-detection scheme for FFT processors that requires no extra computations for locating a pair of faulty butterfly units and can be used for highly reliable real-time systems.
Abstract: This paper proposes a concurrent fault-detection scheme for FFT processors. In the scheme, fault detection is made by comparing the pair of outputs from butterfly units based on the FFT algorithm. The hardware overhead for the scheme is O(N) where N is the number of input data. This scheme requires no extra computations for locating a pair of faulty butterfly units, therefore, the scheme can be used for highly reliable real-time systems.

Journal ArticleDOI
TL;DR: In this article, an analytical Fourier transform method (AFT) was proposed to calculate semiclassical eigenvalues of multidimensional systems in the regular regime, which combines the best properties of those FFT procedures: accuracy, stability and simplicity.

Journal ArticleDOI
TL;DR: An efficient integer squaring algorithm (involving the fast Fourier transform modulo F 8 ) that was used on a 486 computer to discover a large pair of twin primes.
Abstract: We describe an efficient integer squaring algorithm (involving the fast Fourier transform modulo F 8 ) that was used on a 486 computer to discover a large pair of twin primes.

Proceedings ArticleDOI
20 Aug 1997
TL;DR: It is shown that a fast Fourier transform method of generating time series samples of stationary, zero-mean, correlated Gaussian noise typically requires an order of magnitude less operations and memory elements than other algorithms.
Abstract: Rapid generation of time series samples of stationary, zero-mean, correlated Gaussian noise will accelerate digital communication system simulations. In this paper, we show that a fast Fourier transform (FFT) method of generating such samples typically requires an order of magnitude less operations and memory elements than other algorithms.