scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 1983"


Journal ArticleDOI
TL;DR: The implementation of the FFT on vector computers is described, and in the final section it is demonstrated how savings can be achieved in the case of two-dimensional transforms.

183 citations


Journal ArticleDOI
TL;DR: It is shown that the self-sorting variants of the mixed-radix FFT algorithm may be specialized to the case of real or conjugate-symmetric input data, and a multiple real/half-complex transform package on the Cray-1 achieves a 30% saving in CPU time compared with a package using conventional algorithms.

53 citations


Journal ArticleDOI
TL;DR: A highly effective dynamic programming algorithm is presented as a solution to the problem of finding an algorithm from this class which is optimal with respect to the specific add, multiply, and data transfer characteristics of a particular implementation.
Abstract: A broad class of efficient discrete Fourier transform algorithms is developed by partitioning short DFT algorithms into factors. The factored short DFT's are combined into longer DFT's using multi-dimensional index maps. By exploiting a property which allows some of the factors to commute, a large set of possible DFT algorithms is generated which contains both the prime factor algorithm (PFA) and the Winograd Fourier transform algorithm (WFTA) as special cases. The problem of finding an algorithm from this class which is optimal with respect to the specific add, multiply, and data transfer characteristics of a particular implementation is posed, and a highly effective dynamic programming algorithm is presented as a solution.

42 citations


Patent
05 Jan 1983
TL;DR: In this paper, a fast Fourier transform circuit formed on a single chip, including a fast multiplier-accumulator circuit, employs a modified form of Booth's algorithm, an adder circuit, a read-only memory for storing FFT twiddle factors, and a random access memory for holding a set of input complex quantities and for receiving intermediate and final results in in-place FFT operation.
Abstract: A fast Fourier transform circuit formed on a single chip, including a fast multiplier-accumulator circuit which, in the preferred embodiment, employs a modified form of Booth's algorithm, an adder circuit, a read-only memory for storing FFT twiddle factors, and a random access memory for holding a set of input complex quantities and for receiving intermediate and final results in an in-place FFT operation. In the preferred embodiment, the FFT twiddle factors are stored in Booth's code for greater speed of operation. Control and timing circuitry on the same chip generates control signals and address codes in order to perform a sequence of butterfly computations by repeated use of the multiplier-accumulator and adder circuits, to generate FFT coefficients in the random access memory.

31 citations


Journal ArticleDOI
TL;DR: A decimation-in-time radix-2 fast Fourier transform (FFT) algorithm is considered here for implementation in multiprocessors with shared bus, multistage interconnection network (MIN), and in mesh connected computers.
Abstract: A decimation-in-time radix-2 fast Fourier transform (FFT) algorithm is considered here for implementation in multiprocessors with shared bus, multistage interconnection network (MIN), and in mesh connected computers. Results are derived for data allocation, interprocessor communication, approximate computation time, and speedup of an N point FFT on any P available processing elements (PE's). Further generalization is obtained for a radix-r FFT algorithm. An N X N point two-dimensional discrete Fourier transform (DFT) implementation is also considered when one or more rows of the input data matrix are allocated to each PE.

27 citations


Journal ArticleDOI
01 Oct 1983
TL;DR: A description of the parallelism in the radix-2 pipeline FFT is presented, and it is shown that to obtain the required processing rate further parallel processing is necessary.
Abstract: The advantages of using digital convolution to implement a particular pulse compression radar filter are outlined. Using the bandwidth of the given filter, a simple calculation of the required computation rate indicates that considerable parallel computation would be necessary using existing integrated circuits. Some methods of computing the DFT are given and the FFT algorithm is chosen since its regular structure and in-place computation facilitate parallel computation. A description of the parallelism in the radix-2 pipeline FFT is presented, and it is shown that to obtain the required processing rate further parallel processing is necessary. By computing n butterflies in parallel at each stage of the FFT a family of parallel pipeline FFT processors are developed. Using the new architectures allows an increase in the processing speed while retaining the simple structure of the pipeline FFT. It is shown that for each value of N and n there are four canonic forms of equivalent computational complexity, but with different structures. The four forms arise from the two types of DIT FFT algorithm and the two methods of selecting the order in which the butterflies are computed. The connection between the FFT algorithm and the binary m-cube array is given, and is used to show by an example how the architectures presented fit between the normal pipeline FFT and the array processor in the amount of parallel computation involved. The radix-4 pipeline FFT is described and it is shown that this structure can also be paralleled in a similar way to the radix-2 pipeline FFT. The amount of hardware required to implement digital convolution using these architectures is discussed and examples are given. The balance between logic speed and integration density and the problems of interconnecting the computational elements are also discussed.

26 citations


Journal ArticleDOI
TL;DR: This work shows how to compute the discrete Fourier transform at n points with an optimal speed-up as long as the memory is large enough and the control is shown to be simple and easily implementable in VLSI.

16 citations


Journal ArticleDOI
TL;DR: It is shown that, under a VLSI model of computations, such a design requires the same asymptotical area and attains the same throughput as the corresponding network for the evaluation of a single N-element FFT.
Abstract: A network for the evaluation of the fast Fourier transform (FFT) is presented. Such a network is able to compute, in parallel, the FFT's of arbitrary partitions in powers of two of the N input elements. It is shown that, under a VLSI model of computations, such a design requires the same asymptotical area and attains the same throughput as the corresponding network for the evaluation of a single N-element FFT.

13 citations


Journal ArticleDOI
01 Oct 1983
TL;DR: It is shown that number theoretic transforms (NTT) can be used to compute discrete Fourier transform (DFT) very efficiently and the total number of real multiplications for a length-P DFT is reduced to (P — 1).
Abstract: Indexing terms: Mathematical techniques, Transforms Abstract: It is shown that number theoretic transforms (NTT) can be used to compute discrete Fourier transform (DFT) very efficiently. By noting some simple properties of number theory and the DFT, the total number of real multiplications for a length-P DFT is reduced to (P — 1). This requires less than one real multiplication per point. For a proper choice of transform length and NTT, the number of shift adds per point is approximately the same as the number of additions required for FFT algorithms.

13 citations


01 Nov 1983
TL;DR: The simulation studies described show that sign-logarithm arithmetic can be implemented in a practical digital fast fourier transforms (FFT) analyser and the use of a smaller wordlength allows a significant simplification of the system into which the FFT is placed and a higher data throughput rate.
Abstract: : The simulation studies described show that sign-logarithm arithmetic can be implemented in a practical digital fast fourier transforms (FFT) analyser. Sign-logarithm arithmetic allows a smaller wordlength than conventional fixed point arithmetic whilst maintaining performance. Discussion of the hardware implementation of such a sign-logarithm FFT shows that power consumption can be less than conventional methods using bipolar multipliers. The use of a smaller wordlength allows a significant simplification of the system into which the FFT is placed and a higher data throughput rate. (Author)

6 citations


Journal ArticleDOI
TL;DR: This correspondence describes a modified version of the Burrus and Rothweiler prime factor FFT algorithm for the powers of prime length that allows a much wider selection of transform sizes, and calculates the DFT in order.
Abstract: This correspondence describes a modified version of the Burrus and Rothweiler prime factor FFT algorithm for the powers of prime length. It allows a much wider selection of transform sizes, and calculates the DFT in order. Speed measurements show that the resulting program can be faster than Singleton's algorithm for some specific transform length.

Proceedings ArticleDOI
16 Jun 1983
TL;DR: The application of Winograd's algorithm to OTF calculation is reported on and it is compared with some other methods of computing the OTF.
Abstract: Since the advent of the Cooley-Tukey FFT algorithm, many optical designers who have used the method to compute the optical transfer function with desk-top computers would welcome the availability of even faster algorithms. There has, of course, been a steady improvement in FFT techniques in the past decade or so, but it seems that Winograd's algorithm is the most encouraging yet. In this paper we report on the application of this new algorithm to OTF calculation and compare it with some other methods of computing the OTF.© (1983) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.