scispace - formally typeset
Search or ask a question
Journal ArticleDOI

New continuous-flow mixed-radix (CFMR) FFT Processor using novel in-place strategy

05 Jul 2005-IEEE Transactions on Circuits and Systems I-regular Papers (IEEE)-Vol. 52, Iss: 5, pp 911-919
TL;DR: A new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy that can reduce hardware complexity and computation cycles compared with existing FFT processors is proposed.
Abstract: The paper proposes a new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy. The existing in-place strategy supports only a fixed-radix FFT algorithm. In contrast, the proposed in-place strategy can support the MR algorithm, which allows CF FFT computations regardless of the length of FFT. The novel in-place strategy is made by interchanging storage locations of butterfly outputs. The CFMR FFT processor provides the MR algorithm, the in-place strategy, and the CF FFT computations at the same time. The CFMR FFT processor requires only two N-word memories due to the proposed in-place strategy. In addition, it uses one butterfly unit that can perform either one radix-4 butterfly or two radix-2 butterflies. The CFMR FFT processor using the 0.18 /spl mu/m SEC cell library consists of 37,000 gates excluding memories, requires only 640 clock cycles for a 512-point FFT and runs at 100 MHz. Therefore, the CFMR FFT processor can reduce hardware complexity and computation cycles compared with existing FFT processors.
Citations
More filters
Journal ArticleDOI
TL;DR: A fault diagnosis strategy based on the principle component analysis and the multiclass relevance vector machine (PCA-mRVM) that not only achieves higher model sparsity and shorter diagnosis time, but also provides probabilistic outputs for every class membership.
Abstract: Multilevel inverters, for their distinctive performance, have been widely used in high voltage and high-power applications in recent years. As power electronics equipment reliability is very important and to ensure multilevel inverter systems stable operation, it is important to detect and locate faults as quickly as possible. In this context and to improve fault diagnosis accuracy and efficiency of a cascaded H-bridge multilevel inverter system (CHMLIS), a fault diagnosis strategy based on the principle component analysis and the multiclass relevance vector machine (PCA-mRVM), is elaborated and proposed in this paper. First, CHMLIS output voltage signals are selected as input fault classification characteristic signals. Then, a fast Fourier transform is used to preprocess these signals. PCA is used to extract fault signals features and to reduce samples dimensions. Finally, an mRVM model is used to classify faulty samples. Compared to traditional approaches, the proposed PCA-mRVM strategy not only achieves higher model sparsity and shorter diagnosis time, but also provides probabilistic outputs for every class membership. Experimental tests are carried out to highlight the proposed PCA-mRVM diagnosis performances.

181 citations


Cites methods from "New continuous-flow mixed-radix (CF..."

  • ...First, to make the fault signature obvious, the fast Fourier transform (FFT) of the sampled data is computed in order to extract the frequency domain of the signals [31], [32]....

    [...]

Journal ArticleDOI
TL;DR: An multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length is presented.
Abstract: This paper presents an multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length. Based on the MDC architecture, we propose to use radix-Ns butterflies at each stage, where Ns is the number of data streams, so that there is only one butterfly needed in each stage. Consequently, a 100% utilization rate in computational elements is achieved. Moreover, thanks to the simple control mechanism of the MDC, we propose simple memory scheduling methods for input data and output bit/set-reversing, which again results in a full utilization rate in memory usage. Since the memory requirements usually dominate the die area of FFT/inverse fast Fourier transform (IFFT) processors, the proposed scheme can effectively reduce the memory size and thus the die area as well. Furthermore, to apply the proposed scheme in practical applications, we let Ns=4 and implement a 4-stream FFT/IFFT processor with variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This processor can be used in IEEE 802.16 WiMAX and 3GPP long term evolution applications. The processor was implemented with an UMC 90-nm CMOS technology with a core area of 3.1 mm2. The power consumption at 40 MHz was 63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT, respectively in the post-layout simulation. Finally, we analyze the complexity and performance of the implemented processor and compare it with other processors. The results show advantages of the proposed scheme in terms of area and power consumption.

99 citations


Cites background from "New continuous-flow mixed-radix (CF..."

  • ...Continuous-flow mixedradix (CFMR) FFT [8], [9] utilizes two N-sample memories to generate a continuous output stream....

    [...]

Journal ArticleDOI
TL;DR: A generalized mixed-radix (GMR) algorithm is proposed for memory-based fast Fourier transform (FFT) processors to support prime-sized and traditional 2n -point FFTs simultaneously and transforms the index to a multidimensional vector for efficient computation.
Abstract: In this brief, a generalized mixed-radix (GMR) algorithm is proposed for memory-based fast Fourier transform (FFT) processors to support prime-sized and traditional 2n -point FFTs simultaneously It transforms the index to a multidimensional vector for efficient computation By controlling the index vector to satisfy the ?vector reverse? behavior, the GMR algorithm can support not only in-place policy for both computation and I/O data for continuous data flow to minimize the memory size but also multibank memory structures to increase the maximum throughput without memory conflict Finally, a low-complexity implementation of an index vector generator is also proposed for our algorithm

81 citations

Journal ArticleDOI
TL;DR: A generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC) is presented.
Abstract: This paper presents a generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC). The proposed addressing scheme considers the continuous-flow operation with minimum shared memory requirements. To improve throughput, parallel high-radix processing units are employed. We prove that the solution to non-conflict memory access satisfying the constraints of the continuous-flow, variable-size, higher-radix, and parallel-processing operations indeed exists. In addition, a rescheduling technique for twiddle-factor multiplication is developed to reduce hardware complexity and to enhance hardware efficiency. From the results, we can see that the proposed processor has high utilization and efficiency to support flexible configurability for various FFT sizes with fewer computation cycles than the conventional radix-2/radix-4 memory-based FFT processors.

70 citations


Cites background or methods from "New continuous-flow mixed-radix (CF..."

  • ...1, it supports the continuous-flow operation and merges the input and output buffer so as to minimize the total memory requirement to as in [9] and [10]....

    [...]

  • ...However, it has been shown in [9] and [10] that a continuous-flow FFT processor can minimize the storage to...

    [...]

  • ...In the past, numerous FFT processors have been designed [2]–[9]....

    [...]

  • ...In [9] and [11], an in-place strategy was applied for the radix-2/4 butterfly unit....

    [...]

  • ...Among them, pipelined single-path delay feedback (SDF) architecture [3]–[6] and memory-based/cache-memorybased architecture [7]–[9] are two popular solutions....

    [...]

Journal ArticleDOI
TL;DR: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph.
Abstract: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals. The proposed computation is based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph. A new processing element (PE) is proposed using two radix-2 butterflies that can process four inputs in parallel. A novel conflict-free memory-addressing scheme is proposed to ensure the continuous operation of the FFT processor. Furthermore, the addressing scheme is extended to support multiple parallel PEs. The proposed real-FFT processor simultaneously requires fewer computation cycles and lower hardware cost compared to prior work. For example, the proposed design with two PEs reduces the computation cycles by a factor of 2 for a 256-point real fast Fourier transform (RFFT) compared to a prior work while maintaining a lower hardware complexity. The number of computation cycles is reduced proportionately with the increase in the number of PEs.

56 citations


Cites background from "New continuous-flow mixed-radix (CF..."

  • ...Moreover, when compared to the in-place complex FFT [5], [6], one obvious advantage of our proposed RFFT architecture is that the length of the required memory can be reduced by a factor of 2....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor, which has been fabricated in a standard 0.7 /spl mu/m CMOS process and is fully functional on first-pass silicon.
Abstract: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor. The 460000-transistor design has been fabricated in a standard 0.7 /spl mu/m (L/sub poly/=0.6 /spl mu/m) CMOS process and is fully functional on first-pass silicon. At a supply voltage of 1.1 V, it calculates a 1024-point complex FFT in 330 /spl mu/s while consuming 9.5 mW, resulting in an adjusted energy efficiency more than 16 times greater than the previously most efficient known FFT processor. At 3.3 V, it operates at 173 MHz-which is a clock rate 2.6 times greater than the previously fastest rate.

319 citations


"New continuous-flow mixed-radix (CF..." refers background or methods or result in this paper

  • ...The existing processors [21], [22] requires one -word memory since they use the in-place strategy [20]; however, they cannot support the CF FFT. Spiffee [21] using the in-place strategy requires many computation cycles since they use the radix-2 algorithm....

    [...]

  • ...The Spiffee processor [21] uses the radix-2 algorithm, one -word memory and two cache memories partitioned into two banks....

    [...]

  • ...The Spiffee processor [21] uses the radix-2 algorithm, one -word memory and two cache memories partitioned into two banks....

    [...]

  • ...Spiffee supports only a 1,024- point FFT and requires about 5,100 cycles for a 1,024-point FFT....

    [...]

  • ...However, the existing FFT processors [21]–[23] cannot support all of the MR algorithm, the in-place strategy, and the CF FFT at the same time....

    [...]

Proceedings ArticleDOI
11 May 1998
TL;DR: By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor.
Abstract: The design and implementation of a 1024-point pipeline FFT processor is presented. The architecture is based on a new form of FFT, the radix-2/sup 2/ algorithm. By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor. The chip has been implement in 0.5 /spl mu/m CMOS technology and takes an area of 40 mm/sup 2/. With 3.3 V power supply, it can compute 2/sup n/, n=0, 1, ..., 10 complex point forward and inverse FFT in real time with up to 30 MHz sampling frequency. The SQNR is above 50 dB for white noise input.

243 citations


"New continuous-flow mixed-radix (CF..." refers methods in this paper

  • ...Thus, the FFT processors [13]–[15] using the radix-4 algorithm have been proposed....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a multibank address assignment for an arbitrary fixed radix fast Fourier transform (FFT) algorithm suitable for high-speed single-chip implementation is developed, which is memory-bank conflict-free to allow simultaneous access to all the data needed for calculation of each of the radix r butterflies as they occur in the algorithm.
Abstract: A multibank memory address assignment for an arbitrary fixed radix fast Fourier transform (FFT) algorithm suitable for high-speed single-chip implementation is developed. The memory assignment is 'in place' to minimize memory size and is memory-bank conflict-free to allow simultaneous access to all the data needed for calculation of each of the radix r butterflies as they occur in the algorithm. Address generation for table lookup of twiddle factors is also included. The data and twiddle factor address generation hardware is shown to have small size and high speed. >

131 citations

Journal ArticleDOI
TL;DR: This article discusses the VLSI implications of high-speed coded orthogonal frequency-division multiplexing modulation by looking at practical examples of the computational blocks that constitute a COFDM modem and then examining examples ofCOFDM chips.
Abstract: This article discusses the VLSI implications of high-speed coded orthogonal frequency-division multiplexing modulation. This is achieved by looking at practical examples of the computational blocks that constitute a COFDM modem and then examining examples of COFDM chips.

77 citations


"New continuous-flow mixed-radix (CF..." refers background in this paper

  • ...The memory is a dominant component in terms of hardware complexity and power consumption [7]....

    [...]

  • ...In multicarrier modulation, data symbols are transmitted in parallel on multiple subcarriers [7]....

    [...]

Proceedings ArticleDOI
20 Oct 1999
TL;DR: This work discusses the design and implementation of a high-speed, low power 1024-point pipeline FFT processor, which is efficient in terms of power consumption and chip area.
Abstract: We discuss the design and implementation of a high-speed, low power 1024-point pipeline FFT processor. Key features are flexible internal data length and a novel processing element. The FFT processor, which is implemented in a standard 0.35 /spl mu/m CMOS process, is efficient in terms of power consumption and chip area.

65 citations


"New continuous-flow mixed-radix (CF..." refers background in this paper

  • ...For high throughput applications, the pipeline architectures [8]–[ 10 ] have been proposed....

    [...]