scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Novel Memory Reference Reduction Methods for FFT Implementations on DSP Processors

TL;DR: Novel memory reference reduction methods to minimize memory references due to twiddle factors for implementing various different FFT algorithms on DSP using the proposed methods first group the butterflies with identical twiddles from different stages in the FFT diagrams and compute them before computing other butterflies with different twiddle factor lookups.
Abstract: Memory references in digital signal processors (DSP) are expensive due to their long latencies and high power consumption. Implementing fast Fourier transform (FFT) algorithms on DSP involves many memory references to access butterfly inputs and twiddle factors. Conventional FFT implementations require redundant memory references to load identical twiddle factors for butterflies from different stages in the FFT diagrams. In this paper, we present novel memory reference reduction methods to minimize memory references due to twiddle factors for implementing various different FFT algorithms on DSP. The proposed methods first group the butterflies with identical twiddle factors from different stages in the FFT diagrams and compute them before computing other butterflies with different twiddle factors, and then reduce the number of twiddle factor lookups by taking advantage of the properties of twiddle factors. Consequently, each twiddle factor is loaded only once and the number of memory references due to twiddle factors can be minimized. We have applied the proposed methods to implement radix-2 DIF FFT algorithm on TI TMS320C64x DSP. Experimental results show the proposed methods can achieve average of 76.4% reduction in the number of memory references, 53.5% saving of memory spaces due to twiddle factors, and average of 36.5% reduction in the number of clock cycles to compute radix-2 DIF FFT on DSP comparing to the conventional implementation. Similar performance gain is reported for implementing radix-2 DIT FFT algorithms using the new methods
Citations
More filters
Journal ArticleDOI
TL;DR: An improved butterfly structure and an address generation method for fast Fourier transform (FFT) using reduced logic to generate the addresses, avoiding the parity check and barrel shifters commonly used in FFT implementations are presented.
Abstract: In this study, an improved butterfly structure and an address generation method for fast Fourier transform (FFT) are presented. The proposed method uses reduced logic to generate the addresses, avoiding the parity check and barrel shifters commonly used in FFT implementations. A general methodology for radix-2 N-point transforms is derived and the signal flow graph for a 16-point FFT is presented. Furthermore, as a case study, a 16-point FFT with 32-bit complex numbers is synthesized using a CMOS 0.18 mum technology. The circuit gate count analysis indicates that significant logic reduction can be achieved with improved throughput compared to the conventional implementations.

36 citations


Cites methods from "Novel Memory Reference Reduction Me..."

  • ...Wang [12] proposed a new method to reduce the memory reference, but...

    [...]

  • ...Shared-memory-based schemes with a single radix-2 butterfly calculation unit [8]–[12] are used in many embedded...

    [...]

Journal ArticleDOI
TL;DR: Experimental results confirm that the proposed FFT processor for field programmable gate array (FPGA) devices improves the speed, latency, throughput, accuracy, and resource utilization of computation on FPGA devices over existing designs.

24 citations

Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed solution method can effectively recognize the oilleakage fault of gearboxes.
Abstract: This paper presents a novel solution method based on measurement and analysis of current signals for gearbox fault recognition of wind turbine. A gearbox with typical oil-leakage fault is purposely made. The oil-leakage gearbox and a normal gearbox are used as experimental models to measure and analyze the current signals of generator. This work employs wavelet transform (WT), empirical mode decomposition (EMD) and fast Fourier transform (FFT) to analyze the current signals for both the oil-leakage and the normal gearboxes. K-nearest neighbors (KNN) is used on automatic fault recognition. First, the normal gearbox and the oilleakage gearbox are separately applied to practical power platform experiments. Second, empirical mode decomposition is applied on analyzing the intrinsic mode function (IMF) of the current signals, and fast Fourier transform is used to get the intrinsic mode function spectrum. Finally, the features of the spectrum are extracted, and K-nearest neighbors is used on gearbox fault recognition of wind turbine. Experimental results indicate that the proposed solution method can effectively recognize the oilleakage fault of gearboxes.

20 citations


Cites methods from "Novel Memory Reference Reduction Me..."

  • ...Fast Fourier Transform (FFT) This method is the numerical analysis of the Discrete Fourier Transform valid tool for spectrum analysis [12][13]....

    [...]

  • ...Then, those signals are transformed by fast Fourier transform (FFT) [12]-[13] and illustrate a spectrum....

    [...]

Proceedings ArticleDOI
05 Jul 2009
TL;DR: A new approach for higher radix butterflies suitable for pipeline implementation is described, in which the radix-r butterfly computation concept was formulated as composite engines to implement each of the butterfly computations.
Abstract: This article describes a new approach for higher radix butterflies suitable for pipeline implementation. Based on the butterfly computation introduced by Cooley-Tukey [1], we introduce a novel approach for the factorization of the Discrete Fourier Transform (DFT), by redefining the butterfly computation, which is more suitable for efficient VLSI implementation. This proposed factorization motivated us to present a new concept of a radix-r Fast Fourier Transform (FFT), in which the radix-r butterfly computation concept was formulated as composite engines to implement each of the butterfly computations. This concept enables the radix r butterfly-processing element (BPE) to be designed by maintaining only one complex value multiplier in the butterfly critical path for any given r. Algorithmic description and performance of low complexity FFT method are considered in this paper and parallel pipelined FFT in a companion paper [15], Part II Parallel Pipelined FFT Processing.

13 citations


Cites background from "Novel Memory Reference Reduction Me..."

  • ...Fewer attempts to reduce the computational load have failed, due to the added multipliers in the butterfly's critical path for higher radices [7], [8]....

    [...]

Journal ArticleDOI
TL;DR: A new methodology is presented, achieving improved performance by focusing on memory hierarchy utilization and the combination of production and consumption of butterflies' results, data reuse, FFT parallelism, symmetries of twiddle factors and also additions by zeros and multiplications byZeros and ones when twiddle Factors are zero or one.
Abstract: Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Transform in the West (FFTW) for fast Fourier transform (FFT). FFT is a highly important kernel and the performance of its software implementations depends on the memory hierarchy's utilization. FFTW minimizes register spills and data cache accesses by finding a schedule that is independent of the number of the registers and of the number of levels and size of the cache, which is a serious drawback. In this paper, a new methodology is presented, achieving improved performance by focusing on memory hierarchy utilization. The proposed methodology has three major advantages. First, the combination of production and consumption of butterflies' results, data reuse, FFT parallelism, symmetries of twiddle factors and also additions by zeros and multiplications by zeros and ones when twiddle factors are zero or one, are fully and simultaneously exploited. Second, the optimal solution is found according to the number of the registers, the data cache sizes, the number of the levels of data cache hierarchy, the main memory page size, the associativity of the data caches and the data cache line sizes, which are also considered simultaneously and not separate. Third, compilation time and source code size are very small compared with FFTW. The proposed methodology achieves performance gain about 40% (speed-up of 1.7) for architectures with small data cache sizes where memory management has a larger effect on performance and 20% (speed-up of 1.25) on average for architectures with large data cache sizes (Pentium) in comparison with FFTW.

12 citations


Cites background from "Novel Memory Reference Reduction Me..."

  • ...In Section IV, experimental results are presented, and finally Section V is dedicated to conclusions....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Abstract: An efficient method for the calculation of the interactions of a 2' factorial ex- periment was introduced by Yates and is widely known by his name. The generaliza- tion to 3' was given by Box et al. (1). Good (2) generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series. In their full generality, Good's methods are applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices, where m is proportional to log N. This results inma procedure requiring a number of operations proportional to N log N rather than N2. These methods are applied here to the calculation of complex Fourier series. They are useful in situations where the number of data points is, or can be chosen to be, a highly composite number. The algorithm is here derived and presented in a rather different form. Attention is given to the choice of N. It is also shown how special advantage can be obtained in the use of a binary computer with N = 2' and how the entire calculation can be performed within the array of N data storage locations used for the given Fourier coefficients. Consider the problem of calculating the complex Fourier series N-1 (1) X(j) = EA(k)-Wjk, j = 0 1, * ,N- 1, k=0

11,795 citations


"Novel Memory Reference Reduction Me..." refers methods in this paper

  • ...The first efficient FFT algorithm was discovered by Gauss in the 18th century and rediscovered by Cooley and Tukey [ 3 ] in 1960s....

    [...]

Journal ArticleDOI
TL;DR: New algorithms for computing the Discrete Fourier Transform of n points are described, which use substantially fewer multiplications than the best algorithm previously known, and about the same number of additions.
Abstract: New algorithms for computing the Discrete Fourier Transform of n points are described. For n in the range of a few tens to a few thousands these algorithms use substantially fewer multiplications than the best algorithm previously known, and about the same number of additions.

707 citations


"Novel Memory Reference Reduction Me..." refers background in this paper

  • ...Later advances in the research of FFT algorithms include the higher radix FFT [4], the mixedradix FFT [5], the prime-factor FFT [6], Winograd (WFTA) FFT [7], the split-radix FFT [8], [9], the recursive FFT [10], and the combination of decimation-in-time (DIT) and decimation-in-frequency (DIF) FFT algorithms [11]....

    [...]

Journal ArticleDOI
R. Singleton1
TL;DR: This paper presents an algorithm for computing the fast Fourier transform, based on a method proposed by Cooley and Tukey, and includes an efficient method for permuting the results in place.
Abstract: This paper presents an algorithm for computing the fast Fourier transform, based on a method proposed by Cooley and Tukey. As in their algorithm, the dimension n of the transform is factored (if possible), and n/p elementary transforms of dimension p are computed for each factor p of n . An improved method of computing a transform step corresponding to an odd factor of n is given; with this method, the number of complex multiplications for an elementary transform of dimension p is reduced from (p-1)^{2} to (p-1)^{2}/4 for odd p . The fast Fourier transform, when computed in place, requires a final permutation step to arrange the results in normal order. This algorithm includes an efficient method for permuting the results in place. The algorithm is described mathematically and illustrated by a FORTRAN subroutine.

534 citations


"Novel Memory Reference Reduction Me..." refers background in this paper

  • ...Later advances in the research of FFT algorithms include the higher radix FFT [4], the mixedradix FFT [5], the prime-factor FFT [6], Winograd (WFTA) FFT [7], the split-radix FFT [8], [9], the recursive FFT [10], and the combination of decimation-in-time (DIT) and decimation-in-frequency (DIF) FFT algorithms [11]....

    [...]

Journal ArticleDOI
TL;DR: A new N = 2n fast Fourier transform algorithm is presented, which has fewer multiplications and additions than radix 2n, n = 1, 2, 3 algorithms, has the same number of multiplications as the Raderi-Brenner algorithm, but much fewer additions.
Abstract: A new N = 2n fast Fourier transform algorithm is presented, which has fewer multiplications and additions than radix 2n, n = 1, 2, 3 algorithms, has the same number of multiplications as the Raderi-Brenner algorithm, but much fewer additions, and is numerically better conditioned, and is performed ‘in place’ by a repetitive use of a ‘butterfly’-type structure.

412 citations


"Novel Memory Reference Reduction Me..." refers methods in this paper

  • ...Later advances in the research of FFT algorithms include the higher radix FFT [4], the mixedradix FFT [5], the prime-factor FFT [6], Winograd (WFTA) FFT [7], the split-radix FFT [ 8 ], [9], the recursive FFT [10], and the combination of decimation-in-time (DIT) and decimation-in-frequency (DIF) FFT algorithms [11]....

    [...]