scispace - formally typeset
Search or ask a question

Showing papers on "Twiddle factor published in 1992"


Journal ArticleDOI
TL;DR: In this paper, a multibank address assignment for an arbitrary fixed radix fast Fourier transform (FFT) algorithm suitable for high-speed single-chip implementation is developed, which is memory-bank conflict-free to allow simultaneous access to all the data needed for calculation of each of the radix r butterflies as they occur in the algorithm.
Abstract: A multibank memory address assignment for an arbitrary fixed radix fast Fourier transform (FFT) algorithm suitable for high-speed single-chip implementation is developed. The memory assignment is 'in place' to minimize memory size and is memory-bank conflict-free to allow simultaneous access to all the data needed for calculation of each of the radix r butterflies as they occur in the algorithm. Address generation for table lookup of twiddle factors is also included. The data and twiddle factor address generation hardware is shown to have small size and high speed. >

131 citations


Journal ArticleDOI
TL;DR: The sliding fast Fourier transform is reviewed and is shown to have the computational complexity of N complex multiplications per sample, as opposed to the well-cited assumption of (N/2) log/sub 2/ N complex multiplication per sample.
Abstract: The sliding fast Fourier transform (FFT) is reviewed and is shown to have the computational complexity of N complex multiplications per sample, as opposed to the well-cited assumption of (N/2) log/sub 2/ N complex multiplication per sample reported in a book by L.R. Rabiner and B. Gold (1975). >

67 citations


Journal ArticleDOI
TL;DR: A method for performing fast Fourier transforms of various sizes simultaneously in one pipeline processor that consists of several stages of butterfly computational elements alternated with delay-switch-delay modules that reorder the data between the butterfly stages.
Abstract: A method for performing fast Fourier transforms (FFTs) of various sizes simultaneously in one pipeline processor is described. The processor consists of several stages of butterfly computational elements alternated with delay-switch-delay (DSD) modules that reorder the data between the butterfly stages. By properly ordering the input data to the pipeline and the butterfly twiddle factors, the DSD operations in the pipeline can be performed in any desired sequence, enhancing fault tolerance in case of a partial failure in one or more of the DSD modules. If one of the DSDs is no longer capable of operating in its prescribed mode, it is assigned a different operating mode. All the required changes are performed by software control. It is shown that any mixture of FFTs whose sizes are powers of the pipeline's radix can be performed. For FFTs of radix 2, radix 4, and mixed 2 and 4, the principles of operation are explained and examples of timing diagrams are given. >

62 citations


Journal ArticleDOI
01 Nov 1992
TL;DR: An implementation of the Cooley-Tukey complex-to-complex FFT on the Connection Machine is described, which is designed to make effective use of the communications bandwidth of the architecture, its memory bandwidth, and storage with precomputed twiddle factors.
Abstract: We describe an implementation of the Cooley-Tukey complex-to-complex FFT on the Connection Machine. The implementation is designed to make effective use of the communications bandwidth of the architecture, its memory bandwidth, and storage with precomputed twiddle factors. The peak data motion rate that is achieved for the interprocessor communication stages is in excess of 7 Gbytes/s for a Connection Machine system CM-200 with 2048 floating-point processors. The peak rate of FFT computations local to a processor is 12.9 Gflops/s in 32-bit precision, and 10.7 Gflops/s in 64-bit precision. The same FFT routine is used to perform both one- and multi-dimensional FFT without any explicit data rearrangement. The peak performance for a one-dimensional FFT on data distributed over all processors is 5.4 Gflops/s in 32-bit precision and 3.2 Gflops/s in 64-bit precision. The peak performance for square, two-dimensional transforms, is 3.1 Gflops/s in 32-bit precision, and for cubic, three dimensional transforms, the peak is 2.0 Gflops/s in 64-bit precision. Certain oblong shapes yield better performance. The number of twiddle factors stored in each processor is P/2N + log2 N for an FFT on P complex points uniformly distributed among N processors. To achieve this level of storage efficiency we show that a decimation-in-time FFT is required for normal order input, and a decimation-in-frequency FFT is required for bit-reversed input order.

57 citations


Journal ArticleDOI
TL;DR: By introducing a general approach for constructing the fast Hartley transform (FHT) from the corresponding FFT, new vector- and split-vector-radix FHT algorithms with the same desirable properties as their FFT counterparts are obtained.
Abstract: The split-radix approach for computing the discrete Fourier transform (DFT) is extended for the vector-radix fast Fourier transform (FFT) to two and higher dimensions. It is obtained by further splitting the (N/2*N/2) transforms with twiddle factors in the radix (2*2) FFT algorithm. The generalization of this split vector-radix FFT algorithm to higher radices and higher dimensions is also presented. By introducing a general approach for constructing the fast Hartley transform (FHT) from the corresponding FFT, new vector- and split-vector-radix FHT algorithms with the same desirable properties as their FFT counterparts are obtained. >

56 citations


Journal ArticleDOI
TL;DR: In this paper, a generalized prime factor fast Fourier transform (FFT) algorithm was proposed, which can self-sort and in-place simultaneously and has a lower operation count than conventional FFT algorithms.
Abstract: Prime factor fast Fourier transform (FFT) algorithms have two important advantages: they can be simultaneously self-sorting and in-place, and they have a lower operation count than conventional FFT algorithms. The major disadvantage of the prime factor FFT has been that it was only applicable to a limited set of values of the transform length N. This paper presents a generalized prime factor FFT, which is applicable for any $N = 2^p 3^q 5^r $, while maintaining both the self-sorting in-place capability and the lower operation count. Timing experiments on the Cray Y-MP demonstrate the advantages of the new algorithm.

54 citations


Proceedings ArticleDOI
01 Dec 1992
TL;DR: It is shown that multiplication of two N-bit integers can be performed in O(1) time on N*N reconfigurable mesh and can be extended to provide area-time tradeoffs in the usual bit model of VLSI to satisfy AT/sup 2/ optimality over 1
Abstract: It is shown that multiplication of two N-bit integers can be performed in O(1) time on N*N reconfigurable mesh. This result is obtained by combining the O(1) time multiplication algorithm on N*N/sup 2/ reconfigurable mesh, the Rader transform, and decomposition of one-dimensional convolution into multidimensional convolution. Choosing the Radar transform at the expense of long word length frees one from storing twiddle factors in advance, which is needed in other designs. It is also shown that the present algorithm can be simulated on other restricted reconfigurable mesh models without asymptotic increase in time or number of processing elements. It is shown that the present result can be extended to provide area-time tradeoffs in the usual bit model of VLSI to satisfy AT/sup 2/ optimality over 1 >

36 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a distributed memory architecture for the fast Fourier transform (FFT) on a Connection Machine system, where all FFT stages can be performed concurrently, the input data is in normal order, and data allocation is consecutive.

18 citations


Journal ArticleDOI
TL;DR: The main idea of the algorithm, which concerns the grouping of twiddle factors into complex-conjugate pairs, holds true on the whole, but it does not allow a reduction of real multiplications and additions as compared with FFT algorithms of a split-radix type, as the authors assert.
Abstract: For the original article see ibid., vol.25, no.5, p.324-5 (1989). Kamar and Elcherif have proposed an algorithm for computation of the discrete Fourier transform which they called a conjugate pair fast Fourier transform (CPFFT). The main idea of the algorithm, which concerns the grouping of twiddle factors into complex-conjugate pairs, holds true on the whole. However, the commenters point out that it does not allow a reduction of real multiplications and additions as compared with FFT algorithms of a split-radix type, as the authors assert.

15 citations


Journal ArticleDOI
TL;DR: A relationship between the range of the twiddle factor and the dimension of the discrete cosine transform is first derived, whence a suitable scaling model is chosen for the DCT algorithm and the average output signal-to-noise ratio is calculated.
Abstract: In this paper, a fixed-point round-off error analysis of the discrete cosine transform (DCT) has been carried out. A relationship between the range of the twiddle factor and the dimension of the DCT is first derived, whence a suitable scaling model is chosen for the DCT algorithm and the average output signal-to-noise ratio is calculated.

9 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: A very efficient algorithm for computing the discrete Fourier transform (DFT) of real-symmetric input is presented, based on Bruun's algorithm, which achieves the same low arithmetic as the split-radix FFT for real-Symmetric data, but has a structure that is as simple as the radix-2.
Abstract: A very efficient algorithm for computing the discrete Fourier transform (DFT) of real-symmetric input is presented. The algorithm is based on Bruun's algorithm where, except for the last stage, all twiddle factors are purely real. It is well-known that about half of the arithmetic operations and memory requirements can be removed when the input is real-valued. It may be assumed that another half of the computational and memory requirements can be eliminated when the input is real and symmetric. This is, however, impossible with a standard radix-2 fast Fourier transform (FFT), but can be achieved by the Bruun algorithm. The symmetries within the algorithm with for real-symmetric input are exploited to remove about three fourths of the butterflies and memory locations. The algorithm presented achieves the same low arithmetic as the split-radix FFT for real-symmetric data, but has a structure that is as simple as the radix-2. The implementation on the TMS320C30 shows that the new algorithm fits a DSP processor very well. The program requires 0.51-0.60 ms to compute a length 1024 FFT with real-symmetric data. >

Journal ArticleDOI
TL;DR: A set of techniques for improving the performance of the fast Fourier transform (FFT) algorithm on modern vector-oriented supercomputers is presented and it is shown that they achieve higher performance than previously measured on these machines.
Abstract: In this paper a set of techniques for improving the performance of the fast Fourier transform (FFT) algorithm on modern vector-oriented supercomputers is presented. Single-processor FFT implementations based on these techniques are developed for the CRAY-2 and the CRAY Y-MP, and it is shown that they achieve higher performance than previously measured on these machines. The techniques include (1) using gather/scatter operations to maintain optimum length vectors throughout all stages of small-to medium-sized FFTs, (2) using efficient radix-8 and radix-16 inner loops, which allow a large number of vector loads/stores to be overlapped, and (3) prefetching twiddle factors as vectors so that on the CRAY-2 they can later be fetched from local memory in parallel with common memory accesses. Performance results for Fortran implementations using these techniques demonstrate that they are faster than Cray's library FFT routine CFFT2. The actual speedups obtained, which depend on the size of the FFT being computed and the supercomputer being used, range from about 5 to over 300%.

Proceedings ArticleDOI
25 May 1992
TL;DR: This paper shows a fast implementation method of a two dimensional (2D) filter based on fast convolution related to the fast Fourier transform (FFT) look up table that gives a good accurate selection of the desired frequency.
Abstract: This paper shows a fast implementation method of a two dimensional (2D) filter. The filter design is based on fast convolution related to the fast Fourier transform (FFT) look up table. This is then extended to the 2D FFT. The design and implementation of the system gives a good accurate selection of the desired frequency. The system has a step advantage of a reduction in the operation time. >

Proceedings ArticleDOI
11 Nov 1992
TL;DR: Hardware algorithms for one-dimensional fast Fourier transform (FFT) computation on an 8-neighbor processor array are presented and two data mapping methods and algorithms are shown: the algorithm for similarity allocation and the algorithms for superposition allocation.
Abstract: Hardware algorithms for one-dimensional fast Fourier transform (FFT) computation on an 8-neighbor processor array are presented. These algorithms achieve high-speed FFT computation by combining the radix 4 butterfly computation with the communication capabilities of the 8-neighbor processor array. Three algorithms are considered. Two data mapping methods and algorithms are shown: the algorithm for similarity allocation and the algorithm for superposition allocation. The radix 4 and the radix 2 FFT algorithms are compared and evaluated. >

Journal ArticleDOI
TL;DR: Based on some theorems of Number Theory, a new algorithm for computing the FFT (with power of two length) is proposed, which is recursive in nature, and thus the computation structure is rather regular.
Abstract: Since the discovery of the fast Fourier transform (FFT), many new FFT algorithms have been developed. Conventionally, the convolution‐based approach deals commonly with the prime length discrete Fourier transforms. In this paper, based on some theorems of Number Theory, a new algorithm for computing the FFT (with power of two length) is proposed. This novel recursive algorithm contains three stages, the first and the last stages contain only additions and substractions, and the second stage is of block diagonal form, with each block being a circular correlation/convolution matrix. The newly proposed convolution‐based FFT algorithm has the following advantages: 1. In terms of computational counts, this algorithm can achieve the multiplicative lower bound derived by Winograd. 2. The proposed algorithm can easily be implemented in a parallel computing environment. 3. The proposed algorithm is recursive in nature, and thus the computation structure is rather regular.

Journal ArticleDOI
TL;DR: In this paper, an algorithm for the computation of a conformal mapping discretized on a non-uniformly spaced point set, useful for the numerical solution of many problems of fluid dynamics is presented.

Book ChapterDOI
01 Jan 1992
TL;DR: A novel pipeline extension technique is introduced to eliminate the overheads in performing multiple transforms leading to a significant reduction in the execution time.
Abstract: This paper describes the software implementation of the 1D and 2D Fast Fourier Transform (FFT) on the DSP 96002 digital signal processor Two real-valued programs based on FFT packing and real-valued FFT algorithm are introduced A novel pipeline extension technique is introduced to eliminate the overheads (such as twiddle factors loading, updating of address pointers etc) in performing multiple transforms leading to a significant reduction in the execution time The effectiveness of the approach is also demonstrated in the 2D FFT implementation The row-column method is chosen because it exhibits the most degree of parallelism and is simple and efficient to implement Timing results of these algorithms are also given