A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling

doi:10.1109/TVLSI.2010.2077314

Home
/
Papers
/
A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling

Journal Article•DOI•

A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling

Pei-Yun Tsai¹, Chung-Yi Lin¹•Institutions (1)

National Central University¹

01 Dec 2011-IEEE Transactions on Very Large Scale Integration Systems (IEEE)-Vol. 19, Iss: 12, pp 2290-2302

TL;DR: A generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC) is presented.

read less

Abstract: This paper presents a generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC). The proposed addressing scheme considers the continuous-flow operation with minimum shared memory requirements. To improve throughput, parallel high-radix processing units are employed. We prove that the solution to non-conflict memory access satisfying the constraints of the continuous-flow, variable-size, higher-radix, and parallel-processing operations indeed exists. In addition, a rescheduling technique for twiddle-factor multiplication is developed to reduce hardware complexity and to enhance hardware efficiency. From the results, we can see that the proposed processor has high utilization and efficiency to support flexible configurability for various FFT sizes with fewer computation cycles than the conventional radix-2/radix-4 memory-based FFT processors.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems

[...]

Kai-Jiun Yang¹, Shang-Ho Tsai¹, G. C. H. Chuang•Institutions (1)

National Chiao Tung University¹

01 Apr 2013-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length is presented.

...read moreread less

Abstract: This paper presents an multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length. Based on the MDC architecture, we propose to use radix-Ns butterflies at each stage, where Ns is the number of data streams, so that there is only one butterfly needed in each stage. Consequently, a 100% utilization rate in computational elements is achieved. Moreover, thanks to the simple control mechanism of the MDC, we propose simple memory scheduling methods for input data and output bit/set-reversing, which again results in a full utilization rate in memory usage. Since the memory requirements usually dominate the die area of FFT/inverse fast Fourier transform (IFFT) processors, the proposed scheme can effectively reduce the memory size and thus the die area as well. Furthermore, to apply the proposed scheme in practical applications, we let Ns=4 and implement a 4-stream FFT/IFFT processor with variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This processor can be used in IEEE 802.16 WiMAX and 3GPP long term evolution applications. The processor was implemented with an UMC 90-nm CMOS technology with a core area of 3.1 mm2. The power consumption at 40 MHz was 63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT, respectively in the post-layout simulation. Finally, we analyze the complexity and performance of the implemented processor and compare it with other processors. The results show advantages of the proposed scheme in terms of area and power consumption.

...read moreread less

99 citations

Cites background from "A Generalized Conflict-Free Memory ..."

...Continuous-flow mixedradix (CFMR) FFT [8], [9] utilizes two N-sample memories to generate a continuous output stream....
[...]

Journal Article•DOI•

An In-Place FFT Architecture for Real-Valued Signals

[...]

Manohar Ayinala¹, Yingjie Lao¹, Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

08 Aug 2013-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph.

...read moreread less

Abstract: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals. The proposed computation is based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph. A new processing element (PE) is proposed using two radix-2 butterflies that can process four inputs in parallel. A novel conflict-free memory-addressing scheme is proposed to ensure the continuous operation of the FFT processor. Furthermore, the addressing scheme is extended to support multiple parallel PEs. The proposed real-FFT processor simultaneously requires fewer computation cycles and lower hardware cost compared to prior work. For example, the proposed design with two PEs reduces the computation cycles by a factor of 2 for a 256-point real fast Fourier transform (RFFT) compared to a prior work while maintaining a lower hardware complexity. The number of computation cycles is reduced proportionately with the increase in the number of PEs.

...read moreread less

56 citations

Cites methods from "A Generalized Conflict-Free Memory ..."

...Higher radix butterfly units and/or parallel processing can be utilized to increase the throughput [9]....
[...]

Journal Article•DOI•

A High-Throughput Radix-16 FFT Processor With Parallel and Normal Input/Output Ordering for IEEE 802.15.3c Systems

[...]

Shen-Jui Huang¹, Sau-Gee Chen¹•Institutions (1)

National Chiao Tung University¹

10 Jan 2012-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The proposed radix-16 FFT processor is area-efficient with high data processing rate and hardware utilization efficiency, and a conflict-free multibank memory addressing scheme is devised to support up to 16-way parallel and normal-order data input/output.

...read moreread less

Abstract: This paper presents a high-throughput FFT processor for IEEE 802.15.3c (WPANs) standard. To meet the throughput requirement of 2.59 Giga-samples/s, radix-16 FFT algorithm is adopted and reformulated to an efficient form so that the required number of butterfly stages is reduced. Specifically, the radix-16 butterfly processing element consists of two cascaded parallel/pipelined radix-4 butterfly units. It facilitates low-complexity realization of radix-16 butterfly operation and high operation speed due to its optimized pipelined structure. Besides, a new three-stage multiplier for twiddle factor multiplication is also proposed, which has lower area and power consumption than conventional complex multipliers. Moreover, a conflict-free multibank memory addressing scheme is devised to support up to 16-way parallel and normal-order data input/output. Without needing to reorder the input/output data, this scheme helps a high-throughput design result. Equipped with those new performance-boosting techniques, overall the proposed radix-16 FFT processor is area-efficient with high data processing rate and hardware utilization efficiency. The EDA synthesis results show that whole FFT processor area is 0.93 mm2, and the power consumption is 42 mW with 90 nm process. The SQNR performance is 57 dB with 12-bit wordlength implementation.

...read moreread less

54 citations

Journal Article•DOI•

A Novel Memory-Based FFT Architecture for Real-Valued Signals Based on a Radix-2 Decimation-In-Frequency Algorithm

[...]

Zhen-Guo Ma¹, Xiao-Bo Yin¹, Feng Yu¹•Institutions (1)

Zhejiang University¹

20 May 2015-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: A novel architecture for memory-based fast Fourier transform (FFT) computation for real-valued signals based on radix-2 decimation-in-frequency algorithm to minimize the computation clock cycles and maximize the utilization of the processing element (PE).

...read moreread less

Abstract: This brief presents a novel architecture for memory-based fast Fourier transform (FFT) computation for real-valued signals based on radix-2 decimation-in-frequency algorithm. A superior strategy of stage partition for the real FFT (RFFT) is proposed to minimize the computation clock cycles and maximize the utilization of the processing element (PE). The PE employed in our RFFT architecture can process four inputs in parallel by using two radix-2 butterflies and only two multiplexers. The proposed memory-addressing scheme and control of the multiplexers can be expressed in terms of a counter according to the RFFT computation stage. Furthermore, the proposed RFFT architecture can support more PEs in two dimensions as well. Compared with prior works, the proposed RFFT processors have the advantages of fewer computation cycles and lower hardware usage. The experiment shows that the proposed processor reduces the computation cycles by a factor of 17.5% for a 32-point RFFT computation compared with a recently presented work while maintaining lower hardware usage and complexity in the PE design.

...read moreread less

41 citations

Cites methods or result from "A Generalized Conflict-Free Memory ..."

...Moreover, one obvious advantage of the proposed RFFT architecture is that the capability of the required memory can be reduced by a factor of 2, as compared with the traditional memory-based complex FFT processors in [9] and [15]....
[...]
..., pipelined [8] and memory-based architectures [9]....
[...]
...These architectures are adopted in many applications such as optical coherence tomography in image processing [1], orthogonal frequency-division multiplexing and discrete multitone in communication [9], and wireless sensor network [10]....
[...]

Journal Article•DOI•

Efficient Memory-Addressing Algorithms for FFT Processor Design

[...]

Hsin-Fu Luo¹, Yi-Jun Liu, Ming-Der Shieh¹•Institutions (1)

National Cheng Kung University¹

01 Oct 2015-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A data relocation scheme that merges multiple banks to lower the area requirement and power dissipation of memory-based FFT architectures is proposed and the proposed memory-addressing method can effectively deal with single-port, merged-bank memory with high-radix processing elements.

...read moreread less

Abstract: This paper explores efficient memory management schemes for memory-based architectures of the fast Fourier transform (FFT). A data relocation scheme that merges multiple banks to lower the area requirement and power dissipation of memory-based FFT architectures is proposed. The proposed memory-addressing method can effectively deal with single-port, merged-bank memory with high-radix processing elements. Compared with conventional memory-based FFT designs using dual-port memory, the derived architecture has better performance in terms of area and power consumption. The proposed scheme is extended to a cached-memory FFT architecture to further reduce power dissipation. An 8192-point cached-memory FFT processor is implemented for digital video broadcasting-terrestrial/handheld applications by using 0.18- $\mu $ m 1P6M CMOS technology. Experimental results show that the proposed memory scheme consumes 10.1%–29.3% less area and 9.6%–67.9% less power compared with those of the multibank design.

...read moreread less

39 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Designing pipeline FFT processor for OFDM (de)modulation

[...]

Shousheng He¹, M. Torkelson•Institutions (1)

Lund University¹

29 Sep 1998

TL;DR: By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized and the area/power efficiency has been enhanced.

...read moreread less

Abstract: The FFT processor is one of the key components in the implementation of wideband OFDM systems. Architectures with a structured pipeline have been used to meet the fast, real-time processing demand and low-power consumption requirement in a mobile environment. Architectures based on new forms of FFT, the radix-2/sup i/ algorithm derived by cascade decomposition, is proposed. By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized. Progressive wordlength adjustment has been introduced to optimize the total memory size with a given signal-to-quantization-noise-ratio (SQNR) requirement in fixed-point processing. A new complex multiplier based on distributed arithmetic further enhanced the area/power efficiency of the design. A single-chip processor for 1 K complex point FFT transform is used to demonstrate the design issues under consideration.

...read moreread less

322 citations

Journal Article•DOI•

A low-power, high-performance, 1024-point FFT processor

[...]

Bevan M. Baas¹•Institutions (1)

Stanford University¹

01 Mar 1999-IEEE Journal of Solid-state Circuits

TL;DR: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor, which has been fabricated in a standard 0.7 /spl mu/m CMOS process and is fully functional on first-pass silicon.

...read moreread less

Abstract: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor. The 460000-transistor design has been fabricated in a standard 0.7 /spl mu/m (L/sub poly/=0.6 /spl mu/m) CMOS process and is fully functional on first-pass silicon. At a supply voltage of 1.1 V, it calculates a 1024-point complex FFT in 330 /spl mu/s while consuming 9.5 mW, resulting in an adjusted energy efficiency more than 16 times greater than the previously most efficient known FFT processor. At 3.3 V, it operates at 173 MHz-which is a clock rate 2.6 times greater than the previously fastest rate.

...read moreread less

319 citations

"A Generalized Conflict-Free Memory ..." refers background in this paper

...Among them, pipelined single-path delay feedback (SDF) architecture [3]–[6] and memory-based/cache-memorybased architecture [7]–[9] are two popular solutions....
[...]

Journal Article•DOI•

Design of an FFT/IFFT Processor for MIMO OFDM Systems

[...]

Yu-Wei Lin¹, Chen-Yi Lee²•Institutions (2)

MediaTek¹, National Chiao Tung University²

16 Apr 2007-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: A novel 128/64 point fast Fourier transform (FFT)/ inverse FFT (IFFT) processor for the applications in a multiple-input multiple-output orthogonal frequency-division multiplexing based IEEE 802.11n wireless local area network baseband processor.

...read moreread less

Abstract: In this paper, we present a novel 128/64 point fast Fourier transform (FFT)/ inverse FFT (IFFT) processor for the applications in a multiple-input multiple-output orthogonal frequency-division multiplexing based IEEE 802.11n wireless local area network baseband processor. The unfolding mixed-radix multipath delay feedback FFT architecture is proposed to efficiently deal with multiple data sequences. The proposed processor not only supports the operation of FFT/IFFT in 128 points and 64 points but can also provide different throughput rates for 1-4 simultaneous data sequences to meet IEEE 802.11n requirements. Furthermore, less hardware complexity is needed in our design compared with traditional four-parallel approach. The proposed FFT/IFFT processor is designed in a 0.13-mum single-poly and eight-metal CMOS process. The core area is 660times2142 mum2 , including an FFT/IFFT processor and a test module. At the operation clock rate of 40 MHz, our proposed processor can calculate 128-point FFT with four independent data sequences within 3.2 mus meeting IEEE 802.11n standard requirements

...read moreread less

143 citations

"A Generalized Conflict-Free Memory ..." refers background in this paper

...Among them, pipelined single-path delay feedback (SDF) architecture [3]–[6] and memory-based/cache-memorybased architecture [7]–[9] are two popular solutions....
[...]

Journal Article•DOI•

New continuous-flow mixed-radix (CFMR) FFT Processor using novel in-place strategy

[...]

B.G. Jo, Myung Hoon Sunwoo¹•Institutions (1)

Ajou University¹

05 Jul 2005-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: A new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy that can reduce hardware complexity and computation cycles compared with existing FFT processors is proposed.

...read moreread less

Abstract: The paper proposes a new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy. The existing in-place strategy supports only a fixed-radix FFT algorithm. In contrast, the proposed in-place strategy can support the MR algorithm, which allows CF FFT computations regardless of the length of FFT. The novel in-place strategy is made by interchanging storage locations of butterfly outputs. The CFMR FFT processor provides the MR algorithm, the in-place strategy, and the CF FFT computations at the same time. The CFMR FFT processor requires only two N-word memories due to the proposed in-place strategy. In addition, it uses one butterfly unit that can perform either one radix-4 butterfly or two radix-2 butterflies. The CFMR FFT processor using the 0.18 /spl mu/m SEC cell library consists of 37,000 gates excluding memories, requires only 640 clock cycles for a 512-point FFT and runs at 100 MHz. Therefore, the CFMR FFT processor can reduce hardware complexity and computation cycles compared with existing FFT processors.

...read moreread less

128 citations

"A Generalized Conflict-Free Memory ..." refers background or methods in this paper

...1, it supports the continuous-flow operation and merges the input and output buffer so as to minimize the total memory requirement to as in [9] and [10]....
[...]
...However, it has been shown in [9] and [10] that a continuous-flow FFT processor can minimize the storage to...
[...]
...In the past, numerous FFT processors have been designed [2]–[9]....
[...]
...In [9] and [11], an in-place strategy was applied for the radix-2/4 butterfly unit....
[...]
...Among them, pipelined single-path delay feedback (SDF) architecture [3]–[6] and memory-based/cache-memorybased architecture [7]–[9] are two popular solutions....
[...]

Journal Article•DOI•

A dynamic scaling FFT processor for DVB-T applications

[...]

Yu-Wei Lin, Hsuan-Yu Liu, Chen-Yi Lee

25 Oct 2004-IEEE Journal of Solid-state Circuits

TL;DR: This paper presents an 8192-point FFT processor for DVB-T systems, in which a three-step radix-8 FFT algorithm, a new dynamic scaling approach, and a novel matrix prefetch buffer are exploited.

...read moreread less

Abstract: This paper presents an 8192-point FFT processor for DVB-T systems, in which a three-step radix-8 FFT algorithm, a new dynamic scaling approach, and a novel matrix prefetch buffer are exploited. About 64 K bit memory space can be saved in the 8 K point FFT by the proposed dynamic scaling approach. Moreover, with data scheduling and pre-fetched buffering, single-port memory can be adopted without degrading throughput rate. A test chip for 8 K mode DVB-T system has been designed and fabricated using 0.18-/spl mu/m single-poly six-metal CMOS process with core area of 4.84 mm/sup 2/. Power dissipation is about 25.2 mW at 20 MHz.

...read moreread less

111 citations

"A Generalized Conflict-Free Memory ..." refers methods in this paper

...We can see that a specific and similar rescheduling technique for one-half complex multiplications after the radix-8 butterfly has been used in [8]....
[...]