scispace - formally typeset
Search or ask a question

Showing papers on "Prime-factor FFT algorithm published in 1990"


Journal ArticleDOI
TL;DR: Note: V. Madisetti, D. B. Williams, Eds.

862 citations


Book ChapterDOI
01 Jan 1990
TL;DR: In this article, Chen, Smith, and Fralick developed a real arithmetic and recursive algorithm for efficient implementation of the discrete cosine transform (DCT), which is based on the discrete Fourier transform (DFT).
Abstract: Publisher Summary This chapter presents discrete cosine transform. The development of fast algorithms for efficient implementation of the discrete Fourier transform (DFT) by Cooley and Tukey in 1965 has led to phenomenal growth in its applications in digital signal processing (DSP). The discovery of the discrete cosine transform (DCT) in 1974 has provided a significant impact in the DSP field. While the original DCT algorithm is based on the FFT, a real arithmetic and recursive algorithm, developed by Chen, Smith, and Fralick in 1977, was the major breakthrough in the efficient implementation of the DCT. A less well-known but equally efficient algorithm was developed by Corrington. Subsequently, other algorithms, such as the decimation-in-time (DIT),decimation-in-frequency (DIF), split radix, DCT via other discrete transforms such as the discrete Hartley transform (DHT) or the Walsh-Hadamard transform (WHT), prime factor algorithm (PFA), a fast recursive algorithm, and planar rotations, which concentrate on reducing the computational complexity and/or improving the structural simplicity, have been developed. The dramatic development of DCT-based DSP is by no means an accident.

382 citations


Journal ArticleDOI
TL;DR: The method treats Fast Fourier Transforms of multichannel EEGs so that they can be used for intracerebral source localizations and finds the least square deviation sum between the entry positions and their orthogonal projections onto the straight line.

122 citations


Journal ArticleDOI
TL;DR: A fast recursive algorithm for the discrete sine transform (DST) is developed that can be considered as a generalization of the Cooley-Tukey FFT (fast Fourier transform) algorithm.
Abstract: A fast recursive algorithm for the discrete sine transform (DST) is developed. An N-point DST can be generated from two identical N/2-point DSTs. Besides being recursive, this algorithm requires fewer multipliers and adders than other DST algorithms. It can be considered as a generalization of the Cooley-Tukey FFT (fast Fourier transform) algorithm. The structure of the algorithm is suitable for VLSI implementation. >

80 citations


Journal ArticleDOI
01 Dec 1990
TL;DR: By means of the Kronecker matrix product representation, the 1-D algorithms introduced in the paper can readily be generalised to compute transforms of higher dimensions and are more stable than and have fewer arithmetic operations than similar algorithms proposed by Yip and Rao.
Abstract: According to Wang, there are four different types of DCT (discrete cosine transform) and DST (discrete sine transform) and the computation of these sinusoidal transforms can be reduced to the computation of the type-IV DCT. As the algorithms involve different sizes of transforms at different stages they are not so regular in structure. Lee has developed a fast cosine transform (FCT) algorithm for DCT-III similar to the decimation-in-time (DIT) Cooley–Tukey fast Fourier transform (FFT) with a regular structure. A disadvantage of this algorithm is that it involves the division of the trigonometric coefficients and may be numerically unstable. Recently, Hou has developed an algorithm for DCT-II which is similar to a decimation-in-frequency (DIF) algorithm and is numerically stable. However, an index mapping is needed to transform the DCT to a phase-modulated discrete Fourier transform (DFT), which may not be performed in-place. In the paper, a variant of Hou's algorithm is presented which is both in-place and numerically stable. The method is then generalised to compute the entire class of discrete sinusoidal transforms. By making use of the DIT and DIF concepts and the orthogonal properties of the DCTs, it is shown that simple algebraic formulations of these algorithms can readily be obtained. The resulting algorithms are regular in structure and are more stable than and have fewer arithmetic operations than similar algorithms proposed by Yip and Rao. By means of the Kronecker matrix product representation, the 1-D algorithms introduced in the paper can readily be generalised to compute transforms of higher dimensions. These algorithms, which can be viewed as the vector-radix generalisation of the present algorithms, share the in-place and regular structure of their 1-D counterparts.

79 citations


Patent
06 Jul 1990
TL;DR: In this paper, a modular, arrayable, FFT processor for performing a preselected N-point FFT algorithm is presented, which uses an input memory to receive and store data from a plurality of signal-input lines, and to store intermediate butterfly results.
Abstract: A modular, arrayable, FFT processor for performing a preselected N-point FFT algorithms. The processor uses an input memory to receive and store data from a plurality of signal-input lines, and to store intermediate butterfly results. At least one Direct Fourier Transformation (DFT) element selectively performs R-point direct Fourier transformations on the stored data according to a the FFT algorithm. Arithmetic logic elements connected in series with the DFT stage perform required phase adjustment multiplications and accumulate complex data and multiplication products for transformation summations. Accumulated products and summations are transferred to the input memory for storage as intermediate butterfly results, or to an output memory for transfer to a plurality of output lines. At least one adjusted twiddle-factor storage element provides phase adjusting twiddle-factor coefficients for implementation of the FFT algorithm. The coefficients are preselected according to a desired size for the Fourier transformation and a relative array position of the arrayable FFT processor in an array of processors. The adjusted twiddle-factor coefficients are those required to compute all mixed power-of-two, power-of-three, power-of-four, and power-of-six FFTs up to a predetermined maximum-size FFT point value for the array which is equal to or greater than N.

51 citations


Journal ArticleDOI
01 Sep 1990
TL;DR: A parallelization of the Cooley- Tukey FFT algorithm that is implemented on a shared-memory MIMD (non-vector) machine that was built in the Dept. of Computer Science, Tel Aviv University is presented.
Abstract: In this paper we present a parallelization of the Cooley- Tukey FFT algorithm that is implemented on a shared-memory MIMD (non-vector) machine that was built in the Dept. of Computer Science, Tel Aviv University. A parallel algorithm is presented for one dimension Fourier transform with performance analysis. For a large array of complex numbers to be transformed, an almost linear speed-up is demonstrated. This algorithm can be executed by any number of processors, but generally the number is much less than the length of the input data.

40 citations


Journal ArticleDOI
TL;DR: An algorithm for a generalized Chebyshev interpolation procedure, increasing the number of sample points more moderately than doubling, is pre- sented andumerical comparison with other existing algorithms is given.
Abstract: An algorithm for a generalized Chebyshev interpolation procedure, increasing the number of sample points more moderately than doubling, is pre- sented. The FFT for a real sequence is incorporated into the algorithm to enhance its efficiency. Numerical comparison with other existing algorithms is given.

35 citations


Journal ArticleDOI
01 May 1990
TL;DR: A novel systolic implementation of the row-column method for solving the prime factor discrete Fourier transform (DFT) algorithm, which deals with the two-factor decomposition where the transform length N is an odd multiple of 4.
Abstract: The paper discusses a novel systolic implementation of the row-column method for solving the prime factor discrete Fourier transform (DFT) algorithm. It deals, in particular, with the two-factor decomposition where the transform length N is an odd multiple of 4. By processing the four-point row-DFTs coefficient by coefficient, rather than DFT by DFT, as is conventionally done, it is seen how pipelined implementations of the row-DFT and column-DFT processes can be performed simultaneously, without need for matrix transposition of the row-DFT output, resulting in a fully pipelined concurrent solution. Hardware efficiency and simplicity is achieved via the computationally attractive Cordic (co-ordinate digital computer) arithmetic, with O(N) throughput requiring (asymptotically) one-quarter of the hardware requirements of established N-processor solutions. >

29 citations


Journal ArticleDOI
TL;DR: In this article, a 2-D systolic array algorithm for the discrete cosine transform (DCT) is presented, which is based on the inverse discrete Fourier transform (DFT) version of the Goertzel algorithm via Horner's rule.
Abstract: A 2-D systolic array algorithm for the discrete cosine transform (DCT) is presented. It is based on the inverse discrete Fourier transform (DFT) version of the Goertzel algorithm via Horner's rule. This array requires N cells and multipliers, takes square root N+2 clock cycles to produce a complete N-point DCT, and is able to process a continuous stream of data sequences. >

29 citations


Journal ArticleDOI
TL;DR: A new bit reversal permutation algorithm is described that allows for precomputation of seed tables up to one higher power of two than Evans' algorithm.
Abstract: A new bit reversal permutation algorithm is described. Such algorithms are needed for radix 2 (or radix B) fast Fourier transforms (FFTs) or fast Hartley transforms (FHTs). This algorithm is an alternative to one described by Evans (1987). A BASIC version of this algorithm ran slightly faster than the BASIC version of Evans' algorithm given by Bracewell (1986), with some time savings for odd powers of two. This new algorithm also allows for precomputation of seed tables up to one higher power of two than Evans' algorithm. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: The problem of comparing different algorithms for the execution of the fast Fourier transform (FFT) is considered by using the necessary number of instruction cycles for an FFT implementation on different digital signal processors (DSPs) as a measure.
Abstract: The problem of comparing different algorithms for the execution of the fast Fourier transform (FFT) is considered. Instead of counting the required arithmetic operations, the necessary number of instruction cycles for an FFT implementation on different digital signal processors (DSPs) is used as a measure. It turns out that this more practical figure of merit yields a rather different valuation of the algorithms. Furthermore, a method to halve the table size for the radix-2 twiddle factors is described. Some new FFT programs for execution on DSPs are compared with programs provided by the manufacturers. >

Journal ArticleDOI
TL;DR: In this paper, a novel technique is presented for computing the scattering by two-dimensional structures of arbitrary inhomogeneity, which combines the usual finite element (FE) method with the boundaryintegral equation to formulate a discrete system.
Abstract: A novel technique is presented for computing the scattering by two-dimensional structures of arbitrary inhomogeneity. The proposed approach combines the usual finite-element (FE) method with the boundary-integral equation to formulate a discrete system. This is subsequently solved via the conjugate gradient (CG) algorithm. A particular characteristic of the method is the use of rectangular boundaries to enclose the scatterer. Several of the resulting boundary integrals are then convolutions and can be evaluated via the fast Fourier transform (FFT) in the implementation of the CG algorithm. The solution approach presented here offers the principal advantage of having O(N) memory demand and employs a one-dimensional FFT, as against the two-dimensional FFT required in a traditional implementation of the proposed CG-FFT algorithm. The speed of the proposed solution method is compared with that of the traditional CG-FFT algorithm. Results are presented for several rectangular composite cylinders and one perfectly conducting cylinder. These are shown to be in excellent agreement with the moment method. >

Journal ArticleDOI
TL;DR: An assembly written fast Fourier transform (FFT) routine derived from a radix-4 algorithm, which is autogenerated, i.e. an algorithm modified by another algorithm running off-line according to the number of FFT points is realized.

Proceedings ArticleDOI
05 Sep 1990
TL;DR: A computationally balanced arithmetic Fourier transform (AFT) algorithm for Fourier analysis and signal processing is presented in this paper, which uses a butterfly structure which reduces the number of additions by 25%.
Abstract: The arithmetic Fourier transform (AFT) is a number-theoretic approach to Fourier analysis which has been shown to perform competitively with the classical fast Fourier transform (FFT) in terms of accuracy, complexity and speed. Theorems developed previously for the AFT algorithm are used to derive the original AFT algorithm which Bruns found in 1903. This is shown to yield an algorithm of less complexity and of improved performance over certain recent AFT algorithms. A computationally balanced AFT algorithm for Fourier analysis and signal processing is developed. This algorithm does not require complex multiplications. A VLSI architecture is suggested for this amplified AFT algorithm. This architecture uses a butterfly structure which reduces the number of additions by 25% over that used by the direct method. This efficient AFT algorithm is shown to be identical to Brun's original AFT algorithm. >

Journal ArticleDOI
01 Jul 1990
TL;DR: The paper presents the in-place implementation of the multidimensional radix 2 fast Fourier transform (FFT), along with the corresponding algorithm for data shuffling (bit-reversal) on SIMD hypercube computers.
Abstract: The paper presents the in-place implementation of the multidimensional radix 2 fast Fourier transform (FFT), along with the corresponding algorithm for data shuffling (bit-reversal) on SIMD hypercube computers. Each processor possesses its own non-shared memory, the number of processors being less than or equal to the number of data. The flexibility of the proposed algorithm is based on the scheme of information storage that has been chosen and in the decomposition/configuration of the hypercube in subhypercubes that allow the parallel processing of multiple one-dimensional FFTs. This parallel FFT algorithm has an optimum performance, since the data redundancy is null and the algorithmic complexity is optimum. >

Journal ArticleDOI
TL;DR: A new algorithm is introduced such that a discrete cosine transform by correlations can be applied to any odd prime length DCT and is most suitable for VLSI implementation.
Abstract: A new algorithm is introduced such that we can realise a discrete cosine transform by correlations. This algorithm can be applied to any odd prime length DCT and is most suitable for VLSI implementation.

Journal ArticleDOI
TL;DR: Here it is shown that the CPFFT is the same as the SRFFT algorithm from the arithmetic complexity point of view.
Abstract: A recently introduced algorithm for the fast computation of the discrete Fourier transform ,called conjugate pair fast Fourier transform (CPFFT),seemed to require a smaller number of real multiplications and additions than that required for the split radix (SR) FFT algorithm.Here it is shown that the CPFFT is the same as the SRFFT algorithm from the arithmetic complexity point of view.

Journal ArticleDOI
Y. Wu1
TL;DR: The author offers a pipeline and a recirculated shuffle network implementation of the Bruun algorithm for computation of the discrete Fourier transform (DFT) based on the modified perfect shuffle network.
Abstract: In some signal processing applications, the input data are real. In this case, the Bruun algorithm for computation of the discrete Fourier transform (DFT) is attractive. The author offers a pipeline and a recirculated shuffle network implementation of the Bruun algorithm. The implementation of the parallel pipeline and recirculated FFT structures is based on the modified perfect shuffle network. >

Journal ArticleDOI
TL;DR: It is shown that the short length DHTs used by the prime factor algorithm can be nested to lead to the Winograd Hartley transform algorithm.
Abstract: A prime factor algorithm for computing the discrete Hartley transform (DHT) is presented. It is shown that the short length DHTs used by the prime factor algorithm can be nested to lead to the Winograd Hartley transform algorithm.

Book ChapterDOI
01 Jan 1990
TL;DR: It is shown that, independently of any redundance which may exist in the initial system, Fourier's algorithm itself produces a number of redundant inequalities, and it is observed how each constraint is obtained (the construction) from the initial constraints.
Abstract: In the last century, Fourier provided an algorithm for solving linear constraints of the form q≥0. This algorithm relies on eliminations of variable and on creation of new inequalities. A number of these new constraints are redundant (useless). We propose here an improvement of this algorithm. We show that, independently of any redundance which may exist in the initial system, Fourier's algorithm itself produces a number of redundant inequalities. In order to highlight this fact, we observe how each constraint is obtained (the construction) from the initial constraints. We then define a minimality property on these constructions which allows us not to produce the inequalities redundant because of Fourier's algorithm. We show that the subset of constraints thus obtained is stable for the order of Fourier steps. We then characterise these minimal constructions, and provide an algorithm to detect them. This algorithm is second degree polynomial in time and memory space with the number of eliminated variables.

Journal ArticleDOI
TL;DR: In this paper, a fixed-point error analysis has been carried out for the fast Hartley transform (FHT) and the results are compared with the FFT error-analysis results.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A new class of FFT (fast Fourier transform) algorithms that run very efficiently on digital signal processors (DSPs) is described, shown to be more than 20% faster than traditional sequential algorithms adapted to the processor, because of lower overhead, and better utilization of the parallel instruction sets and the pipelining is obtained.
Abstract: A new class of FFT (fast Fourier transform) algorithms that run very efficiently on digital signal processors (DSPs) is described. The algorithms are based on a tensor product factorization of the DFT (discrete Fourier transform). The tensor product factorization not only controls the breakdown into short-length DFTs but also shows the data flow between the various blocks. This allows a better scheduling of operations, which again gives a better utilization of the DSP pipelining/parallel capabilities, and leads to algorithms with significantly lower overhead than traditional methods. Several different programs have been implemented in assembly code for the TMS320C30 and simulated to find their execution times. The new algorithms are shown to be more than 20% faster than traditional sequential algorithms adapted to the processor, because of lower overhead, and better utilization of the parallel instruction sets and the pipelining is obtained. >

Proceedings ArticleDOI
24 Sep 1990
TL;DR: It is found that an odd length type II and III DWT can be mapped to a discrete Hartley transform (DHT) by means of a simple index mapping.
Abstract: New algorithms for computing the discrete W transform (DWT) of arbitrary lengths are presented. It is found that an odd length type II and III DWT can be mapped to a discrete Hartley transform (DHT) by means of a simple index mapping. The DHT or DWT-I can be computed, for example, by the real-valued fast Fourier transform algorithms such as the real-valued prime factor fast Fourier transform algorithm (RPFA FFT). Using the close relationship between the odd DFTs and the DWTs, it is possible to compute the type II and III DWTs with even lengths by means of the real-valued FFT or the fast Hartley transform (FHT). Similar algorithms are also presented for the DWT-IV. >

Journal ArticleDOI
TL;DR: An attempt is made to explain as clearly as possible the problem of address generation and how the prime factor mapping technique is used in this class of algorithms.
Abstract: An attempt is made to explain as clearly as possible the problem of address generation and how the prime factor mapping technique is used in this class of algorithms. Two novel address generation schemes are proposed to improve efficiency. The first scheme reduces the computation required for unscrambling data in an in-place realization of the PFA (prime factor algorithm) by reducing the number of variables used to calculate the data addresses. The second scheme is to be used in an in-place in-order realization of PFA. It achieves high efficiency by replacing complicated modulo operations of conventional approaches by simple indirect addressing techniques. Making use of this scheme, software packages have been written for the computation of DFTs (discrete Fourier transforms) using a high-level language and two low-level languages (the 80286/287 and TMS330C25 assembly languages). Results of these realizations show that a reduction of 50% in address generation time is achievable, giving a saving of 30% in total computation time. A hardware address generator is also developed, which may provide clues to improving digital signal processor architectures in the future. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: An efficient parallel architecture has been developed that can perform a 2-D Fourier transform in O(( squareroot N)N log square root N) time and offers an attractive tradeoff between size and speedup with an improvement of processor performance/size by over a factor of five.
Abstract: An efficient parallel architecture has been developed that can perform a 2-D Fourier transform in O(( square root N)N log square root N) time. The speedup is achieved through a decomposition of the of the 2-D Fourier transform into two smaller M*M 2-D transforms and a parallel implementation of the smaller transform. Memory bandwidth is not a problem in this architecture with a new memory partitioning strategy that successfully divides the large memory into N smaller, independent memories. The flexibility and modularity of the new 2-D FFT algorithm allows for a variety of sizes for the parallel 2-D FFT units. The decomposition of the 2-D FFT can be applied as many times as necessary until the right tradeoff between size and speed is obtained. The architecture offers an attractive tradeoff between size and speedup with an improvement of processor performance/size by over a factor of five. >

Journal ArticleDOI
TL;DR: The proposed algorithm, like the WRT algorithm, is based on the one-dimensional fast Fourier transform (FFT) and, compared to the traditional ways of computing the multidimensional DFT, offers substantial savings in the number of one- dimensional FFT procedure calls.
Abstract: The authors generalize the weighted redundancy transform (WRT) algorithm for computing the multidimensional discrete Fourier transform (DFT) in the case in which the sample size (blocklength) is not the same on every axis. The proposed algorithm, like the WRT algorithm, is based on the one-dimensional fast Fourier transform (FFT) and, compared to the traditional ways of computing the multidimensional DFT, offers substantial savings in the number of one-dimensional FFT procedure calls. While the algorithm is applicable to transforms of any dimensions, only the two-dimensional case is explored in detail. >

Journal ArticleDOI
TL;DR: A fast algorithm is presented for numerical evaluation of forward and inverse Radon transforms by rewriting the transform as a convolution, a computational speed is obtained similar to the speed of the 2D fast Fourier transform.
Abstract: A fast algorithm is presented for numerical evaluation of forward and inverse Radon transforms. The algorithm does not perform exact one-to-one mapping as the discrete Fourier transform but, due to the use of band-limited basis functions, it is robust and sufficiently accurate for seismic applications. By rewriting the transform as a convolution, a computational speed is obtained similar to the speed of the 2D fast Fourier transform.

Journal ArticleDOI
01 Aug 1990
TL;DR: The paper presents a two-dimensional (2-D) recursive fast Fourier transform (RFFT) which consists of two recursive algorithms and a revised fast Fouriers transform in the recursive process and is applied to image processing as an example of 2-D signal processing.
Abstract: The paper presents a two-dimensional (2-D) recursive fast Fourier transform (RFFT) which consists of two recursive algorithms and a revised fast Fourier transform in the recursive process. This algorithm is applied to image processing as an example of 2-D signal processing. Compared with a standard FFT, the RFFT has the advantages of not requiring the number of input data points to be equal to the number of discrete frequencies and of being suitable for online processing. Compared with other recursive Fourier transforms, the RFFT has a shorter computation time.

Journal ArticleDOI
01 Apr 1990
TL;DR: Very efficient application-specific realizations spanning a wide throughput range are proposed for both DFT and FFT algorithms, with novel single-cycle address computations for the FFT.
Abstract: Many Fourier transform applications have to operate at fixed sample rates in the low to medium range, especially in signal processing systems. Hence, in order to arrive at efficient implementations, hardware-sharing is required as in microcoded architectures. In this paper, very efficient application-specific realizations spanning a wide throughput range are proposed for both DFT and FFT algorithms. Novel single-cycle address computations are presented for the FFT to obtain these results. Trade-offs between the architectural alternatives are provided too. These designs have been used as test-vehicles for the architectural strategy in an automated synthesis tool-box tuned towards signal processing applications.