Low-power variable-length fast Fourier transform processor

doi:10.1049/IP-CDT:20041224

Home
/
Papers
/
Low-power variable-length fast Fourier transform processor

Journal Article•DOI•

Low-power variable-length fast Fourier transform processor

Y.-T. Lin¹, Pei-Yun Tsai¹, Tzi-Dar Chiueh¹•Institutions (1)

National Taiwan University¹

08 Jul 2005-Vol. 152, Iss: 4, pp 499-506

TL;DR: A variable-length FFT processor design that is based on a radix-2/4/8 algorithm and a single-path delay feedback architecture that can function correctly up to 45 MHz with a 3.3 V supply voltage and power consumption of 640 mW.

read less

Abstract: Fast Fourier transform (FFT) processing is one of the key procedures in the popular orthogonal frequency division multiplexing (OFDM) communication systems. Structured pipeline architectures and low power consumption are the main concerns for its VLSI implementation. In the paper, the authors report a variable-length FFT processor design that is based on a radix-2/4/8 algorithm and a single-path delay feedback architecture. The processor can be used in various OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T), asymmetric digital subscriber loop (ADSL) and very-high-speed digital subscriber loop (VDSL). To reduce power consumption and chip area, special current-mode SRAMs are adopted to replace shift registers in the delay lines. In addition, techniques including complex multipliers containing three real multiplications, and reduced sine/cosine tables are adopted. The chip is fabricated using a 0.35 /spl mu/m CMOS process and it measures 3900 /spl mu/m /spl times/ 5500 /spl mu/m. According to the measured results, the 2048-point FFT operation can function correctly up to 45 MHz with a 3.3 V supply voltage and power consumption of 640 mW. In low-power operation, when the supply voltage is scaled down to 2.3 V, the processor consumes 176 mW when it runs at 17.8 MHz.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

OFDM Baseband Receiver Design for Wireless Communications

[...]

Tzi-Dar Chiueh¹, Pei-Yun Tsai•Institutions (1)

National Taiwan University¹

04 Dec 2007

TL;DR: This timely text on baseband design of OFDM Baseband Receiver Design for Wireless Communications closes the gap between OFDM theory and implementation and enables the reader to transfer communication receiver concepts into hardware design wireless receivers with acceptable implementation loss achieve low-power designs.

...read moreread less

Abstract: Orthogonal frequency-division multiplexing (OFDM) access schemes are becoming more prevalent among cellular and wireless broadband systems, accelerating the need for smaller, more energy efficient receiver solutions. Up to now the majority of OFDM texts have dealt with signal processing aspects. To address the current gap in OFDM integrated circuit (IC) instruction, Chiueh and Tsai have produced this timely text on baseband design. OFDM Baseband Receiver Design for Wireless Communications covers the gamut of OFDM technology, from theories and algorithms to architectures and circuits. Chiueh and Tsai give a concise yet comprehensive look at digital communications fundamentals before explaining modulation and signal processing algorithms in OFDM receivers. Moreover, the authors give detailed treatment of hardware issues -- from design methodology to physical IC implementation. Closes the gap between OFDM theory and implementation Enables the reader to transfer communication receiver concepts into hardware design wireless receivers with acceptable implementation loss achieve low-power designs Contains numerous figures to illustrate techniques Features concrete design examples of MC-CDMA systems and cognitive radio applications Presents theoretical discussions that focus on concepts rather than mathematical derivation Provides a much-needed single source of material from numerous papers Based on course materials for a class in digital communication IC design, this book is ideal for advanced undergraduate or post-graduate students from either VLSI design or signal processing backgrounds. New and experienced engineers in industry working on algorithms or hardware for wireless communications devices will also find this book to be a key reference.

...read moreread less

258 citations

Journal Article•DOI•

Power and Area Minimization of Reconfigurable FFT Processors: A 3GPP-LTE Example

[...]

Chia-Hsiang Yang¹, Tsung-Han Yu², Dejan Markovic²•Institutions (2)

National Chiao Tung University¹, University of California, Los Angeles²

01 Mar 2012-IEEE Journal of Solid-state Circuits

TL;DR: A design methodology for power and area minimization of flexible FFT processors based on the power-area tradeoff space obtained by adjusting algorithm, architecture, and circuit variables is presented.

...read moreread less

Abstract: This paper presents a design methodology for power and area minimization of flexible FFT processors. The methodology is based on the power-area tradeoff space obtained by adjusting algorithm, architecture, and circuit variables. Radix factorization is the main technique for achieving high energy efficiency with flexibility, followed by architecture parallelism and delay line circuits. The flexibility is provided by reconfigurable processing units that support radix-2/4/8/16 factorizations. As a proof of concept, a 128- to 2048-point FFT processor for 3GPP-LTE standard has been implemented in a 65-nm CMOS process. The processor designed for minimum power-area product is integrated in 1.25 × 1.1 mm2 and dissipates 4.05 mW at 0.45 V for the 20 MHz LTE bandwidth. The energy dissipation ranging from 2.5 to 103.7 nJ/FFT for 128 to 2048 points makes it the lowest energy flexible FFT.

...read moreread less

120 citations

Cites background from "Low-power variable-length fast Four..."

...The hardware complexity is minimized by only reducing the number of complex (full) multiplications for various radix FFTs....
[...]
...This comparison reveals that radix factorization has higher impact than circuit-level techniques....
[...]

Journal Article•DOI•

MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems

[...]

Kai-Jiun Yang¹, Shang-Ho Tsai¹, G. C. H. Chuang•Institutions (1)

National Chiao Tung University¹

01 Apr 2013-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length is presented.

...read moreread less

Abstract: This paper presents an multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length. Based on the MDC architecture, we propose to use radix-Ns butterflies at each stage, where Ns is the number of data streams, so that there is only one butterfly needed in each stage. Consequently, a 100% utilization rate in computational elements is achieved. Moreover, thanks to the simple control mechanism of the MDC, we propose simple memory scheduling methods for input data and output bit/set-reversing, which again results in a full utilization rate in memory usage. Since the memory requirements usually dominate the die area of FFT/inverse fast Fourier transform (IFFT) processors, the proposed scheme can effectively reduce the memory size and thus the die area as well. Furthermore, to apply the proposed scheme in practical applications, we let Ns=4 and implement a 4-stream FFT/IFFT processor with variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This processor can be used in IEEE 802.16 WiMAX and 3GPP long term evolution applications. The processor was implemented with an UMC 90-nm CMOS technology with a core area of 3.1 mm2. The power consumption at 40 MHz was 63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT, respectively in the post-layout simulation. Finally, we analyze the complexity and performance of the implemented processor and compare it with other processors. The results show advantages of the proposed scheme in terms of area and power consumption.

...read moreread less

99 citations

Additional excerpts

...Proposed [26] [27] [29] [31] [30] [24] [28] [25] [20] [14] [12]...
[...]

Proceedings Article•DOI•

Scalable-effort classifiers for energy-efficient machine learning

[...]

Swagath Venkataramani¹, Anand Raghunathan¹, Jie Liu², Mohammed Shoaib²•Institutions (2)

Purdue University¹, Microsoft²

07 Jun 2015

TL;DR: This paper proposes scalable-effort classifiers, a new approach to optimizing the energy efficiency of supervised machine-learning classifiers that dynamically adjust their computational effort depending on the difficulty of the input data, while maintaining the same level of accuracy.

...read moreread less

Abstract: Supervised machine-learning algorithms are used to solve classification problems across the entire spectrum of computing platforms, from data centers to wearable devices, and place significant demand on their computational capabilities. In this paper, we propose scalable-effort classifiers, a new approach to optimizing the energy efficiency of supervised machine-learning classifiers. We observe that the inherent classification difficulty varies widely across inputs in real-world datasets; only a small fraction of the inputs truly require the full computational effort of the classifier, while the large majority can be classified correctly with very low effort. Yet, state-of-the-art classification algorithms expend equal effort on all inputs, irrespective of their difficulty. To address this inefficiency, we introduce the concept of scalable-effort classifiers, or classifiers that dynamically adjust their computational effort depending on the difficulty of the input data, while maintaining the same level of accuracy. Scalable effort classifiers are constructed by utilizing a chain of classifiers with increasing levels of complexity (and accuracy). Scalable effort execution is achieved by modulating the number of stages used for classifying a given input. Every stage in the chain contains an ensemble of biased classifiers, where each biased classifier is trained to detect a single class more accurately. The degree of consensus between the biased classifiers' outputs is used to decide whether classification can be terminated at the current stage or not. Our methodology thus allows us to transform any given classification algorithm into a scalable-effort chain. We build scalable-effort versions of 8 popular recognition applications using 3 different classification algorithms. Our experiments demonstrate that scalable-effort classifiers yield 2.79x reduction in average operations per input, which translates to 2.3x and 1.5x improvement in energy for hardware and software implementations, respectively.

...read moreread less

74 citations

Cites background from "Low-power variable-length fast Four..."

...RELATED WORK Most previous efforts in building input-aware computational systems have considered application-specific solutions [3, 4]....
[...]

Journal Article•DOI•

A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling

[...]

Pei-Yun Tsai¹, Chung-Yi Lin¹•Institutions (1)

National Central University¹

01 Dec 2011-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC) is presented.

...read moreread less

Abstract: This paper presents a generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC). The proposed addressing scheme considers the continuous-flow operation with minimum shared memory requirements. To improve throughput, parallel high-radix processing units are employed. We prove that the solution to non-conflict memory access satisfying the constraints of the continuous-flow, variable-size, higher-radix, and parallel-processing operations indeed exists. In addition, a rescheduling technique for twiddle-factor multiplication is developed to reduce hardware complexity and to enhance hardware efficiency. From the results, we can see that the proposed processor has high utilization and efficiency to support flexible configurability for various FFT sizes with fewer computation cycles than the conventional radix-2/radix-4 memory-based FFT processors.

...read moreread less

70 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Multicarrier modulation for data transmission: an idea whose time has come

[...]

J.A.C. Bingham

01 May 1990-IEEE Communications Magazine

TL;DR: The general technique of parallel transmission on many carriers, called multicarrier modulation (MCM), is explained, and the performance that can be achieved on an undistorted channel and algorithms for achieving that performance are discussed.

...read moreread less

Abstract: The general technique of parallel transmission on many carriers, called multicarrier modulation (MCM), is explained. The performance that can be achieved on an undistorted channel and algorithms for achieving that performance are discussed. Ways of dealing with channel impairments and of improving the performance through coding are described, and implementation methods are considered. Duplex operation of MCM and the possible use of this on the general switched telephone network are examined. >

...read moreread less

3,995 citations

Book•

Theory and application of digital signal processing

[...]

Lawrence R. Rabiner, Ben Gold, C. K. Yuen¹•Institutions (1)

Australian National University¹

01 Jan 1975

TL;DR: Feyman and Wing as discussed by the authors introduced the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.

...read moreread less

Abstract: sprightly style and is interesting from cover to cover. The comments, critiques, and summaries that accompany the chapters are very helpful in crystalizing the ideas and answering questions that may arise, particularly to the self-learner. The transparency in the presentation of the material in the book equips the reader to proceed quickly to a wealth of problems included at the end of each chapter. These problems ranging from elementary to research-level are very valuable in that a solid working knowledge of the invariant imbedding techniques is acquired as well as good insight in attacking problems in various applied areas. Furthermore, a useful selection of references is given at the end of each chapter. This book may not appeal to those mathematicians who are interested primarily in the sophistication of mathematical theory, because the authors have deliberately avoided all pseudo-sophistication in attaining transparency of exposition. Precisely for the same reason the majority of the intended readers who are applications-oriented and are eager to use the techniques quickly in their own fields will welcome and appreciate the efforts put into writing this book. From a purely mathematical point of view, some of the invariant imbedding results may be considered to be generalizations of the classical theory of first-order partial differential equations, and a part of the analysis of invariant imbedding is still at a somewhat heuristic stage despite successes in many computational applications. However, those who are concerned with mathematical rigor will find opportunities to explore the foundations of the invariant imbedding method. In conclusion, let me quote the following: "What is the best method to obtain the solution to a problem'? The answer is, any way that works." (Richard P. Feyman, Engineering and Science, March 1965, Vol. XXVIII, no. 6, p. 9.) In this well-written book, Bellman and Wing have indeed accomplished the task of introducing the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.

...read moreread less

3,249 citations

Journal Article•DOI•

Analysis and Simulation of a Digital Mobile Channel Using Orthogonal Frequency Division Multiplexing

[...]

Jr. L.J. Cimini¹•Institutions (1)

Bell Labs¹

01 Jul 1985-IEEE Transactions on Communications

TL;DR: The analysis and simulation of a technique for combating the effects of multipath propagation and cochannel interference on a narrow-band digital mobile channel using the discrete Fourier transform to orthogonally frequency multiplex many narrow subchannels, each signaling at a very low rate, into one high-rate channel is discussed.

...read moreread less

Abstract: This paper discusses the analysis and simulation of a technique for combating the effects of multipath propagation and cochannel interference on a narrow-band digital mobile channel. This system uses the discrete Fourier transform to orthogonally frequency multiplex many narrow subchannels, each signaling at a very low rate, into one high-rate channel. When this technique is used with pilot-based correction, the effects of flat Rayleigh fading can be reduced significantly. An improvement in signal-to-interference ratio of 6 dB can be obtained over the bursty Rayleigh channel. In addition, with each subchannel signaling at a low rate, this technique can provide added protection against delay spread. To enhance the behavior of the technique in a heavily frequency-selective environment, interpolated pilots are used. A frequency offset reference scheme is employed for the pilots to improve protection against cochannel interference.

...read moreread less

2,627 citations

Journal Article•DOI•

CORDIC-based VLSI architectures for digital signal processing

[...]

Yu Hen Hu¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jul 1992-IEEE Signal Processing Magazine

TL;DR: In this paper, the evolution of CORDIC, an iterative arithmetic computing algorithm capable of evaluating various elementary functions using a unified shift-and-add approach, is reviewed.

...read moreread less

Abstract: The evolution of CORDIC, an iterative arithmetic computing algorithm capable of evaluating various elementary functions using a unified shift-and-add approach, and of CORDIC processors is reviewed. A method to utilize a CORDIC processor array to implement digital signal processing algorithms is presented. The approach is to reformulate existing DSP algorithms so that they are suitable for implementation with an array performing circular or hyperbolic rotation operations. Three categories of algorithm are surveyed: linear transformations, digital filters, and matrix-based DSP algorithms. >

...read moreread less

492 citations

Proceedings Article•DOI•

Designing pipeline FFT processor for OFDM (de)modulation

[...]

Shousheng He¹, M. Torkelson•Institutions (1)

Lund University¹

29 Sep 1998

TL;DR: By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized and the area/power efficiency has been enhanced.

...read moreread less

Abstract: The FFT processor is one of the key components in the implementation of wideband OFDM systems. Architectures with a structured pipeline have been used to meet the fast, real-time processing demand and low-power consumption requirement in a mobile environment. Architectures based on new forms of FFT, the radix-2/sup i/ algorithm derived by cascade decomposition, is proposed. By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized. Progressive wordlength adjustment has been introduced to optimize the total memory size with a given signal-to-quantization-noise-ratio (SQNR) requirement in fixed-point processing. A new complex multiplier based on distributed arithmetic further enhanced the area/power efficiency of the design. A single-chip processor for 1 K complex point FFT transform is used to demonstrate the design issues under consideration.

...read moreread less

322 citations