New continuous-flow mixed-radix (CFMR) FFT Processor using novel in-place strategy

doi:10.1109/TCSI.2005.846667

Home
/
Papers
/
New continuous-flow mixed-radix (CFMR) FFT Processor using novel in-place strategy

Journal Article•DOI•

New continuous-flow mixed-radix (CFMR) FFT Processor using novel in-place strategy

B.G. Jo, Myung Hoon Sunwoo¹•Institutions (1)

Ajou University¹

05 Jul 2005-IEEE Transactions on Circuits and Systems I-regular Papers (IEEE)-Vol. 52, Iss: 5, pp 911-919

TL;DR: A new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy that can reduce hardware complexity and computation cycles compared with existing FFT processors is proposed.

read less

Abstract: The paper proposes a new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy. The existing in-place strategy supports only a fixed-radix FFT algorithm. In contrast, the proposed in-place strategy can support the MR algorithm, which allows CF FFT computations regardless of the length of FFT. The novel in-place strategy is made by interchanging storage locations of butterfly outputs. The CFMR FFT processor provides the MR algorithm, the in-place strategy, and the CF FFT computations at the same time. The CFMR FFT processor requires only two N-word memories due to the proposed in-place strategy. In addition, it uses one butterfly unit that can perform either one radix-4 butterfly or two radix-2 butterflies. The CFMR FFT processor using the 0.18 /spl mu/m SEC cell library consists of 37,000 gates excluding memories, requires only 640 clock cycles for a 512-point FFT and runs at 100 MHz. Therefore, the CFMR FFT processor can reduce hardware complexity and computation cycles compared with existing FFT processors.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Cascaded H-Bridge Multilevel Inverter System Fault Diagnosis Using a PCA and Multiclass Relevance Vector Machine Approach

[...]

Tianzhen Wang¹, Hao Xu¹, Jingang Han¹, Elhoussin Elbouchikhi, Mohamed Benbouzid - Show less +1 more•Institutions (1)

Shanghai Maritime University¹

16 Jan 2015-IEEE Transactions on Power Electronics

TL;DR: A fault diagnosis strategy based on the principle component analysis and the multiclass relevance vector machine (PCA-mRVM) that not only achieves higher model sparsity and shorter diagnosis time, but also provides probabilistic outputs for every class membership.

...read moreread less

Abstract: Multilevel inverters, for their distinctive performance, have been widely used in high voltage and high-power applications in recent years. As power electronics equipment reliability is very important and to ensure multilevel inverter systems stable operation, it is important to detect and locate faults as quickly as possible. In this context and to improve fault diagnosis accuracy and efficiency of a cascaded H-bridge multilevel inverter system (CHMLIS), a fault diagnosis strategy based on the principle component analysis and the multiclass relevance vector machine (PCA-mRVM), is elaborated and proposed in this paper. First, CHMLIS output voltage signals are selected as input fault classification characteristic signals. Then, a fast Fourier transform is used to preprocess these signals. PCA is used to extract fault signals features and to reduce samples dimensions. Finally, an mRVM model is used to classify faulty samples. Compared to traditional approaches, the proposed PCA-mRVM strategy not only achieves higher model sparsity and shorter diagnosis time, but also provides probabilistic outputs for every class membership. Experimental tests are carried out to highlight the proposed PCA-mRVM diagnosis performances.

...read moreread less

181 citations

Cites methods from "New continuous-flow mixed-radix (CF..."

...First, to make the fault signature obvious, the fast Fourier transform (FFT) of the sampled data is computed in order to extract the frequency domain of the signals [31], [32]....
[...]

Journal Article•DOI•

MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems

[...]

Kai-Jiun Yang¹, Shang-Ho Tsai¹, G. C. H. Chuang•Institutions (1)

National Chiao Tung University¹

01 Apr 2013-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length is presented.

...read moreread less

Abstract: This paper presents an multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length. Based on the MDC architecture, we propose to use radix-Ns butterflies at each stage, where Ns is the number of data streams, so that there is only one butterfly needed in each stage. Consequently, a 100% utilization rate in computational elements is achieved. Moreover, thanks to the simple control mechanism of the MDC, we propose simple memory scheduling methods for input data and output bit/set-reversing, which again results in a full utilization rate in memory usage. Since the memory requirements usually dominate the die area of FFT/inverse fast Fourier transform (IFFT) processors, the proposed scheme can effectively reduce the memory size and thus the die area as well. Furthermore, to apply the proposed scheme in practical applications, we let Ns=4 and implement a 4-stream FFT/IFFT processor with variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This processor can be used in IEEE 802.16 WiMAX and 3GPP long term evolution applications. The processor was implemented with an UMC 90-nm CMOS technology with a core area of 3.1 mm2. The power consumption at 40 MHz was 63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT, respectively in the post-layout simulation. Finally, we analyze the complexity and performance of the implemented processor and compare it with other processors. The results show advantages of the proposed scheme in terms of area and power consumption.

...read moreread less

99 citations

Cites background from "New continuous-flow mixed-radix (CF..."

...Continuous-flow mixedradix (CFMR) FFT [8], [9] utilizes two N-sample memories to generate a continuous output stream....
[...]

Journal Article•DOI•

A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors

[...]

Chen-Fong Hsiao¹, Yuan Chen¹, Chen-Yi Lee¹•Institutions (1)

National Chiao Tung University¹

01 Jan 2010-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: A generalized mixed-radix (GMR) algorithm is proposed for memory-based fast Fourier transform (FFT) processors to support prime-sized and traditional 2n -point FFTs simultaneously and transforms the index to a multidimensional vector for efficient computation.

...read moreread less

Abstract: In this brief, a generalized mixed-radix (GMR) algorithm is proposed for memory-based fast Fourier transform (FFT) processors to support prime-sized and traditional 2n -point FFTs simultaneously It transforms the index to a multidimensional vector for efficient computation By controlling the index vector to satisfy the ?vector reverse? behavior, the GMR algorithm can support not only in-place policy for both computation and I/O data for continuous data flow to minimize the memory size but also multibank memory structures to increase the maximum throughput without memory conflict Finally, a low-complexity implementation of an index vector generator is also proposed for our algorithm

...read moreread less

81 citations

Journal Article•DOI•

A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling

[...]

Pei-Yun Tsai¹, Chung-Yi Lin¹•Institutions (1)

National Central University¹

01 Dec 2011-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC) is presented.

...read moreread less

Abstract: This paper presents a generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC). The proposed addressing scheme considers the continuous-flow operation with minimum shared memory requirements. To improve throughput, parallel high-radix processing units are employed. We prove that the solution to non-conflict memory access satisfying the constraints of the continuous-flow, variable-size, higher-radix, and parallel-processing operations indeed exists. In addition, a rescheduling technique for twiddle-factor multiplication is developed to reduce hardware complexity and to enhance hardware efficiency. From the results, we can see that the proposed processor has high utilization and efficiency to support flexible configurability for various FFT sizes with fewer computation cycles than the conventional radix-2/radix-4 memory-based FFT processors.

...read moreread less

70 citations

Cites background or methods from "New continuous-flow mixed-radix (CF..."

...1, it supports the continuous-flow operation and merges the input and output buffer so as to minimize the total memory requirement to as in [9] and [10]....
[...]
...However, it has been shown in [9] and [10] that a continuous-flow FFT processor can minimize the storage to...
[...]
...In the past, numerous FFT processors have been designed [2]–[9]....
[...]
...In [9] and [11], an in-place strategy was applied for the radix-2/4 butterfly unit....
[...]
...Among them, pipelined single-path delay feedback (SDF) architecture [3]–[6] and memory-based/cache-memorybased architecture [7]–[9] are two popular solutions....
[...]

Journal Article•DOI•

An In-Place FFT Architecture for Real-Valued Signals

[...]

Manohar Ayinala¹, Yingjie Lao¹, Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

08 Aug 2013-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph.

...read moreread less

Abstract: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals. The proposed computation is based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph. A new processing element (PE) is proposed using two radix-2 butterflies that can process four inputs in parallel. A novel conflict-free memory-addressing scheme is proposed to ensure the continuous operation of the FFT processor. Furthermore, the addressing scheme is extended to support multiple parallel PEs. The proposed real-FFT processor simultaneously requires fewer computation cycles and lower hardware cost compared to prior work. For example, the proposed design with two PEs reduces the computation cycles by a factor of 2 for a 256-point real fast Fourier transform (RFFT) compared to a prior work while maintaining a lower hardware complexity. The number of computation cycles is reduced proportionately with the increase in the number of PEs.

...read moreread less

56 citations

Cites background from "New continuous-flow mixed-radix (CF..."

...Moreover, when compared to the in-place complex FFT [5], [6], one obvious advantage of our proposed RFFT architecture is that the length of the required memory can be reduced by a factor of 2....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A low-power, high-performance, 1024-point FFT processor

[...]

Bevan M. Baas¹•Institutions (1)

Stanford University¹

01 Mar 1999-IEEE Journal of Solid-state Circuits

TL;DR: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor, which has been fabricated in a standard 0.7 /spl mu/m CMOS process and is fully functional on first-pass silicon.

...read moreread less

Abstract: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor. The 460000-transistor design has been fabricated in a standard 0.7 /spl mu/m (L/sub poly/=0.6 /spl mu/m) CMOS process and is fully functional on first-pass silicon. At a supply voltage of 1.1 V, it calculates a 1024-point complex FFT in 330 /spl mu/s while consuming 9.5 mW, resulting in an adjusted energy efficiency more than 16 times greater than the previously most efficient known FFT processor. At 3.3 V, it operates at 173 MHz-which is a clock rate 2.6 times greater than the previously fastest rate.

...read moreread less

319 citations

"New continuous-flow mixed-radix (CF..." refers background or methods or result in this paper

...The existing processors [21], [22] requires one -word memory since they use the in-place strategy [20]; however, they cannot support the CF FFT. Spiffee [21] using the in-place strategy requires many computation cycles since they use the radix-2 algorithm....
[...]
...The Spiffee processor [21] uses the radix-2 algorithm, one -word memory and two cache memories partitioned into two banks....
[...]
...The Spiffee processor [21] uses the radix-2 algorithm, one -word memory and two cache memories partitioned into two banks....
[...]
...Spiffee supports only a 1,024- point FFT and requires about 5,100 cycles for a 1,024-point FFT....
[...]
...However, the existing FFT processors [21]–[23] cannot support all of the MR algorithm, the in-place strategy, and the CF FFT at the same time....
[...]

Proceedings Article•DOI•

Design and implementation of a 1024-point pipeline FFT processor

[...]

Shousheng He¹, M. Torkelson¹•Institutions (1)

Lund University¹

11 May 1998

TL;DR: By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor.

...read moreread less

Abstract: The design and implementation of a 1024-point pipeline FFT processor is presented. The architecture is based on a new form of FFT, the radix-2/sup 2/ algorithm. By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor. The chip has been implement in 0.5 /spl mu/m CMOS technology and takes an area of 40 mm/sup 2/. With 3.3 V power supply, it can compute 2/sup n/, n=0, 1, ..., 10 complex point forward and inverse FFT in real time with up to 30 MHz sampling frequency. The SQNR is above 50 dB for white noise input.

...read moreread less

243 citations

"New continuous-flow mixed-radix (CF..." refers methods in this paper

...Thus, the FFT processors [13]–[15] using the radix-4 algorithm have been proposed....
[...]

Journal Article•DOI•

Conflict free memory addressing for dedicated FFT hardware

[...]

L.G. Johnson¹•Institutions (1)

Oklahoma State University–Stillwater¹

01 May 1992-IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing

TL;DR: In this paper, a multibank address assignment for an arbitrary fixed radix fast Fourier transform (FFT) algorithm suitable for high-speed single-chip implementation is developed, which is memory-bank conflict-free to allow simultaneous access to all the data needed for calculation of each of the radix r butterflies as they occur in the algorithm.

...read moreread less

Abstract: A multibank memory address assignment for an arbitrary fixed radix fast Fourier transform (FFT) algorithm suitable for high-speed single-chip implementation is developed. The memory assignment is 'in place' to minimize memory size and is memory-bank conflict-free to allow simultaneous access to all the data needed for calculation of each of the radix r butterflies as they occur in the algorithm. Address generation for table lookup of twiddle factors is also included. The data and twiddle factor address generation hardware is shown to have small size and high speed. >

...read moreread less

131 citations

Journal Article•DOI•

VLSI for OFDM

[...]

Neil Weste¹, D.J. Skellern•Institutions (1)

Macquarie University¹

01 Oct 1998-IEEE Communications Magazine

TL;DR: This article discusses the VLSI implications of high-speed coded orthogonal frequency-division multiplexing modulation by looking at practical examples of the computational blocks that constitute a COFDM modem and then examining examples ofCOFDM chips.

...read moreread less

Abstract: This article discusses the VLSI implications of high-speed coded orthogonal frequency-division multiplexing modulation. This is achieved by looking at practical examples of the computational blocks that constitute a COFDM modem and then examining examples of COFDM chips.

...read moreread less

77 citations

"New continuous-flow mixed-radix (CF..." refers background in this paper

...The memory is a dominant component in terms of hardware complexity and power consumption [7]....
[...]
...In multicarrier modulation, data symbols are transmitted in parallel on multiple subcarriers [7]....
[...]

Proceedings Article•DOI•

A pipeline FFT processor

[...]

Weidong Li¹, L. Wanhammar•Institutions (1)

Linköping University¹

20 Oct 1999

TL;DR: This work discusses the design and implementation of a high-speed, low power 1024-point pipeline FFT processor, which is efficient in terms of power consumption and chip area.

...read moreread less

Abstract: We discuss the design and implementation of a high-speed, low power 1024-point pipeline FFT processor. Key features are flexible internal data length and a novel processing element. The FFT processor, which is implemented in a standard 0.35 /spl mu/m CMOS process, is efficient in terms of power consumption and chip area.

...read moreread less

65 citations