scispace - formally typeset
Search or ask a question
Author

Chung-Yi Lin

Bio: Chung-Yi Lin is an academic researcher from National Central University. The author has contributed to research in topics: Shared memory & Memory address. The author has an hindex of 1, co-authored 1 publications receiving 61 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC) is presented.
Abstract: This paper presents a generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC). The proposed addressing scheme considers the continuous-flow operation with minimum shared memory requirements. To improve throughput, parallel high-radix processing units are employed. We prove that the solution to non-conflict memory access satisfying the constraints of the continuous-flow, variable-size, higher-radix, and parallel-processing operations indeed exists. In addition, a rescheduling technique for twiddle-factor multiplication is developed to reduce hardware complexity and to enhance hardware efficiency. From the results, we can see that the proposed processor has high utilization and efficiency to support flexible configurability for various FFT sizes with fewer computation cycles than the conventional radix-2/radix-4 memory-based FFT processors.

70 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length is presented.
Abstract: This paper presents an multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length. Based on the MDC architecture, we propose to use radix-Ns butterflies at each stage, where Ns is the number of data streams, so that there is only one butterfly needed in each stage. Consequently, a 100% utilization rate in computational elements is achieved. Moreover, thanks to the simple control mechanism of the MDC, we propose simple memory scheduling methods for input data and output bit/set-reversing, which again results in a full utilization rate in memory usage. Since the memory requirements usually dominate the die area of FFT/inverse fast Fourier transform (IFFT) processors, the proposed scheme can effectively reduce the memory size and thus the die area as well. Furthermore, to apply the proposed scheme in practical applications, we let Ns=4 and implement a 4-stream FFT/IFFT processor with variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This processor can be used in IEEE 802.16 WiMAX and 3GPP long term evolution applications. The processor was implemented with an UMC 90-nm CMOS technology with a core area of 3.1 mm2. The power consumption at 40 MHz was 63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT, respectively in the post-layout simulation. Finally, we analyze the complexity and performance of the implemented processor and compare it with other processors. The results show advantages of the proposed scheme in terms of area and power consumption.

99 citations

Journal ArticleDOI
TL;DR: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph.
Abstract: This brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals. The proposed computation is based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph. A new processing element (PE) is proposed using two radix-2 butterflies that can process four inputs in parallel. A novel conflict-free memory-addressing scheme is proposed to ensure the continuous operation of the FFT processor. Furthermore, the addressing scheme is extended to support multiple parallel PEs. The proposed real-FFT processor simultaneously requires fewer computation cycles and lower hardware cost compared to prior work. For example, the proposed design with two PEs reduces the computation cycles by a factor of 2 for a 256-point real fast Fourier transform (RFFT) compared to a prior work while maintaining a lower hardware complexity. The number of computation cycles is reduced proportionately with the increase in the number of PEs.

56 citations

Journal ArticleDOI
TL;DR: The proposed radix-16 FFT processor is area-efficient with high data processing rate and hardware utilization efficiency, and a conflict-free multibank memory addressing scheme is devised to support up to 16-way parallel and normal-order data input/output.
Abstract: This paper presents a high-throughput FFT processor for IEEE 802.15.3c (WPANs) standard. To meet the throughput requirement of 2.59 Giga-samples/s, radix-16 FFT algorithm is adopted and reformulated to an efficient form so that the required number of butterfly stages is reduced. Specifically, the radix-16 butterfly processing element consists of two cascaded parallel/pipelined radix-4 butterfly units. It facilitates low-complexity realization of radix-16 butterfly operation and high operation speed due to its optimized pipelined structure. Besides, a new three-stage multiplier for twiddle factor multiplication is also proposed, which has lower area and power consumption than conventional complex multipliers. Moreover, a conflict-free multibank memory addressing scheme is devised to support up to 16-way parallel and normal-order data input/output. Without needing to reorder the input/output data, this scheme helps a high-throughput design result. Equipped with those new performance-boosting techniques, overall the proposed radix-16 FFT processor is area-efficient with high data processing rate and hardware utilization efficiency. The EDA synthesis results show that whole FFT processor area is 0.93 mm2, and the power consumption is 42 mW with 90 nm process. The SQNR performance is 57 dB with 12-bit wordlength implementation.

54 citations

Journal ArticleDOI
TL;DR: A novel architecture for memory-based fast Fourier transform (FFT) computation for real-valued signals based on radix-2 decimation-in-frequency algorithm to minimize the computation clock cycles and maximize the utilization of the processing element (PE).
Abstract: This brief presents a novel architecture for memory-based fast Fourier transform (FFT) computation for real-valued signals based on radix-2 decimation-in-frequency algorithm. A superior strategy of stage partition for the real FFT (RFFT) is proposed to minimize the computation clock cycles and maximize the utilization of the processing element (PE). The PE employed in our RFFT architecture can process four inputs in parallel by using two radix-2 butterflies and only two multiplexers. The proposed memory-addressing scheme and control of the multiplexers can be expressed in terms of a counter according to the RFFT computation stage. Furthermore, the proposed RFFT architecture can support more PEs in two dimensions as well. Compared with prior works, the proposed RFFT processors have the advantages of fewer computation cycles and lower hardware usage. The experiment shows that the proposed processor reduces the computation cycles by a factor of 17.5% for a 32-point RFFT computation compared with a recently presented work while maintaining lower hardware usage and complexity in the PE design.

41 citations

Journal ArticleDOI
TL;DR: A data relocation scheme that merges multiple banks to lower the area requirement and power dissipation of memory-based FFT architectures is proposed and the proposed memory-addressing method can effectively deal with single-port, merged-bank memory with high-radix processing elements.
Abstract: This paper explores efficient memory management schemes for memory-based architectures of the fast Fourier transform (FFT). A data relocation scheme that merges multiple banks to lower the area requirement and power dissipation of memory-based FFT architectures is proposed. The proposed memory-addressing method can effectively deal with single-port, merged-bank memory with high-radix processing elements. Compared with conventional memory-based FFT designs using dual-port memory, the derived architecture has better performance in terms of area and power consumption. The proposed scheme is extended to a cached-memory FFT architecture to further reduce power dissipation. An 8192-point cached-memory FFT processor is implemented for digital video broadcasting-terrestrial/handheld applications by using 0.18- $\mu $ m 1P6M CMOS technology. Experimental results show that the proposed memory scheme consumes 10.1%–29.3% less area and 9.6%–67.9% less power compared with those of the multibank design.

39 citations