scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Low-power variable-length fast Fourier transform processor

08 Jul 2005-Vol. 152, Iss: 4, pp 499-506
TL;DR: A variable-length FFT processor design that is based on a radix-2/4/8 algorithm and a single-path delay feedback architecture that can function correctly up to 45 MHz with a 3.3 V supply voltage and power consumption of 640 mW.
Abstract: Fast Fourier transform (FFT) processing is one of the key procedures in the popular orthogonal frequency division multiplexing (OFDM) communication systems. Structured pipeline architectures and low power consumption are the main concerns for its VLSI implementation. In the paper, the authors report a variable-length FFT processor design that is based on a radix-2/4/8 algorithm and a single-path delay feedback architecture. The processor can be used in various OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T), asymmetric digital subscriber loop (ADSL) and very-high-speed digital subscriber loop (VDSL). To reduce power consumption and chip area, special current-mode SRAMs are adopted to replace shift registers in the delay lines. In addition, techniques including complex multipliers containing three real multiplications, and reduced sine/cosine tables are adopted. The chip is fabricated using a 0.35 /spl mu/m CMOS process and it measures 3900 /spl mu/m /spl times/ 5500 /spl mu/m. According to the measured results, the 2048-point FFT operation can function correctly up to 45 MHz with a 3.3 V supply voltage and power consumption of 640 mW. In low-power operation, when the supply voltage is scaled down to 2.3 V, the processor consumes 176 mW when it runs at 17.8 MHz.
Citations
More filters
Book
04 Dec 2007
TL;DR: This timely text on baseband design of OFDM Baseband Receiver Design for Wireless Communications closes the gap between OFDM theory and implementation and enables the reader to transfer communication receiver concepts into hardware design wireless receivers with acceptable implementation loss achieve low-power designs.
Abstract: Orthogonal frequency-division multiplexing (OFDM) access schemes are becoming more prevalent among cellular and wireless broadband systems, accelerating the need for smaller, more energy efficient receiver solutions. Up to now the majority of OFDM texts have dealt with signal processing aspects. To address the current gap in OFDM integrated circuit (IC) instruction, Chiueh and Tsai have produced this timely text on baseband design. OFDM Baseband Receiver Design for Wireless Communications covers the gamut of OFDM technology, from theories and algorithms to architectures and circuits. Chiueh and Tsai give a concise yet comprehensive look at digital communications fundamentals before explaining modulation and signal processing algorithms in OFDM receivers. Moreover, the authors give detailed treatment of hardware issues -- from design methodology to physical IC implementation. Closes the gap between OFDM theory and implementation Enables the reader to transfer communication receiver concepts into hardware design wireless receivers with acceptable implementation loss achieve low-power designs Contains numerous figures to illustrate techniques Features concrete design examples of MC-CDMA systems and cognitive radio applications Presents theoretical discussions that focus on concepts rather than mathematical derivation Provides a much-needed single source of material from numerous papers Based on course materials for a class in digital communication IC design, this book is ideal for advanced undergraduate or post-graduate students from either VLSI design or signal processing backgrounds. New and experienced engineers in industry working on algorithms or hardware for wireless communications devices will also find this book to be a key reference.

258 citations

Journal ArticleDOI
TL;DR: A design methodology for power and area minimization of flexible FFT processors based on the power-area tradeoff space obtained by adjusting algorithm, architecture, and circuit variables is presented.
Abstract: This paper presents a design methodology for power and area minimization of flexible FFT processors. The methodology is based on the power-area tradeoff space obtained by adjusting algorithm, architecture, and circuit variables. Radix factorization is the main technique for achieving high energy efficiency with flexibility, followed by architecture parallelism and delay line circuits. The flexibility is provided by reconfigurable processing units that support radix-2/4/8/16 factorizations. As a proof of concept, a 128- to 2048-point FFT processor for 3GPP-LTE standard has been implemented in a 65-nm CMOS process. The processor designed for minimum power-area product is integrated in 1.25 × 1.1 mm2 and dissipates 4.05 mW at 0.45 V for the 20 MHz LTE bandwidth. The energy dissipation ranging from 2.5 to 103.7 nJ/FFT for 128 to 2048 points makes it the lowest energy flexible FFT.

120 citations


Cites background from "Low-power variable-length fast Four..."

  • ...The hardware complexity is minimized by only reducing the number of complex (full) multiplications for various radix FFTs....

    [...]

  • ...This comparison reveals that radix factorization has higher impact than circuit-level techniques....

    [...]

Journal ArticleDOI
TL;DR: An multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length is presented.
Abstract: This paper presents an multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length. Based on the MDC architecture, we propose to use radix-Ns butterflies at each stage, where Ns is the number of data streams, so that there is only one butterfly needed in each stage. Consequently, a 100% utilization rate in computational elements is achieved. Moreover, thanks to the simple control mechanism of the MDC, we propose simple memory scheduling methods for input data and output bit/set-reversing, which again results in a full utilization rate in memory usage. Since the memory requirements usually dominate the die area of FFT/inverse fast Fourier transform (IFFT) processors, the proposed scheme can effectively reduce the memory size and thus the die area as well. Furthermore, to apply the proposed scheme in practical applications, we let Ns=4 and implement a 4-stream FFT/IFFT processor with variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This processor can be used in IEEE 802.16 WiMAX and 3GPP long term evolution applications. The processor was implemented with an UMC 90-nm CMOS technology with a core area of 3.1 mm2. The power consumption at 40 MHz was 63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT, respectively in the post-layout simulation. Finally, we analyze the complexity and performance of the implemented processor and compare it with other processors. The results show advantages of the proposed scheme in terms of area and power consumption.

99 citations


Additional excerpts

  • ...Proposed [26] [27] [29] [31] [30] [24] [28] [25] [20] [14] [12]...

    [...]

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper proposes scalable-effort classifiers, a new approach to optimizing the energy efficiency of supervised machine-learning classifiers that dynamically adjust their computational effort depending on the difficulty of the input data, while maintaining the same level of accuracy.
Abstract: Supervised machine-learning algorithms are used to solve classification problems across the entire spectrum of computing platforms, from data centers to wearable devices, and place significant demand on their computational capabilities. In this paper, we propose scalable-effort classifiers, a new approach to optimizing the energy efficiency of supervised machine-learning classifiers. We observe that the inherent classification difficulty varies widely across inputs in real-world datasets; only a small fraction of the inputs truly require the full computational effort of the classifier, while the large majority can be classified correctly with very low effort. Yet, state-of-the-art classification algorithms expend equal effort on all inputs, irrespective of their difficulty. To address this inefficiency, we introduce the concept of scalable-effort classifiers, or classifiers that dynamically adjust their computational effort depending on the difficulty of the input data, while maintaining the same level of accuracy. Scalable effort classifiers are constructed by utilizing a chain of classifiers with increasing levels of complexity (and accuracy). Scalable effort execution is achieved by modulating the number of stages used for classifying a given input. Every stage in the chain contains an ensemble of biased classifiers, where each biased classifier is trained to detect a single class more accurately. The degree of consensus between the biased classifiers' outputs is used to decide whether classification can be terminated at the current stage or not. Our methodology thus allows us to transform any given classification algorithm into a scalable-effort chain. We build scalable-effort versions of 8 popular recognition applications using 3 different classification algorithms. Our experiments demonstrate that scalable-effort classifiers yield 2.79x reduction in average operations per input, which translates to 2.3x and 1.5x improvement in energy for hardware and software implementations, respectively.

74 citations


Cites background from "Low-power variable-length fast Four..."

  • ...RELATED WORK Most previous efforts in building input-aware computational systems have considered application-specific solutions [3, 4]....

    [...]

Journal ArticleDOI
TL;DR: A generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC) is presented.
Abstract: This paper presents a generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC). The proposed addressing scheme considers the continuous-flow operation with minimum shared memory requirements. To improve throughput, parallel high-radix processing units are employed. We prove that the solution to non-conflict memory access satisfying the constraints of the continuous-flow, variable-size, higher-radix, and parallel-processing operations indeed exists. In addition, a rescheduling technique for twiddle-factor multiplication is developed to reduce hardware complexity and to enhance hardware efficiency. From the results, we can see that the proposed processor has high utilization and efficiency to support flexible configurability for various FFT sizes with fewer computation cycles than the conventional radix-2/radix-4 memory-based FFT processors.

70 citations

References
More filters
Journal ArticleDOI
TL;DR: The general technique of parallel transmission on many carriers, called multicarrier modulation (MCM), is explained, and the performance that can be achieved on an undistorted channel and algorithms for achieving that performance are discussed.
Abstract: The general technique of parallel transmission on many carriers, called multicarrier modulation (MCM), is explained. The performance that can be achieved on an undistorted channel and algorithms for achieving that performance are discussed. Ways of dealing with channel impairments and of improving the performance through coding are described, and implementation methods are considered. Duplex operation of MCM and the possible use of this on the general switched telephone network are examined. >

3,995 citations

Book
01 Jan 1975
TL;DR: Feyman and Wing as discussed by the authors introduced the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.
Abstract: sprightly style and is interesting from cover to cover. The comments, critiques, and summaries that accompany the chapters are very helpful in crystalizing the ideas and answering questions that may arise, particularly to the self-learner. The transparency in the presentation of the material in the book equips the reader to proceed quickly to a wealth of problems included at the end of each chapter. These problems ranging from elementary to research-level are very valuable in that a solid working knowledge of the invariant imbedding techniques is acquired as well as good insight in attacking problems in various applied areas. Furthermore, a useful selection of references is given at the end of each chapter. This book may not appeal to those mathematicians who are interested primarily in the sophistication of mathematical theory, because the authors have deliberately avoided all pseudo-sophistication in attaining transparency of exposition. Precisely for the same reason the majority of the intended readers who are applications-oriented and are eager to use the techniques quickly in their own fields will welcome and appreciate the efforts put into writing this book. From a purely mathematical point of view, some of the invariant imbedding results may be considered to be generalizations of the classical theory of first-order partial differential equations, and a part of the analysis of invariant imbedding is still at a somewhat heuristic stage despite successes in many computational applications. However, those who are concerned with mathematical rigor will find opportunities to explore the foundations of the invariant imbedding method. In conclusion, let me quote the following: "What is the best method to obtain the solution to a problem'? The answer is, any way that works." (Richard P. Feyman, Engineering and Science, March 1965, Vol. XXVIII, no. 6, p. 9.) In this well-written book, Bellman and Wing have indeed accomplished the task of introducing the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.

3,249 citations

Journal ArticleDOI
Jr. L.J. Cimini1
TL;DR: The analysis and simulation of a technique for combating the effects of multipath propagation and cochannel interference on a narrow-band digital mobile channel using the discrete Fourier transform to orthogonally frequency multiplex many narrow subchannels, each signaling at a very low rate, into one high-rate channel is discussed.
Abstract: This paper discusses the analysis and simulation of a technique for combating the effects of multipath propagation and cochannel interference on a narrow-band digital mobile channel. This system uses the discrete Fourier transform to orthogonally frequency multiplex many narrow subchannels, each signaling at a very low rate, into one high-rate channel. When this technique is used with pilot-based correction, the effects of flat Rayleigh fading can be reduced significantly. An improvement in signal-to-interference ratio of 6 dB can be obtained over the bursty Rayleigh channel. In addition, with each subchannel signaling at a low rate, this technique can provide added protection against delay spread. To enhance the behavior of the technique in a heavily frequency-selective environment, interpolated pilots are used. A frequency offset reference scheme is employed for the pilots to improve protection against cochannel interference.

2,627 citations

Journal ArticleDOI
TL;DR: In this paper, the evolution of CORDIC, an iterative arithmetic computing algorithm capable of evaluating various elementary functions using a unified shift-and-add approach, is reviewed.
Abstract: The evolution of CORDIC, an iterative arithmetic computing algorithm capable of evaluating various elementary functions using a unified shift-and-add approach, and of CORDIC processors is reviewed. A method to utilize a CORDIC processor array to implement digital signal processing algorithms is presented. The approach is to reformulate existing DSP algorithms so that they are suitable for implementation with an array performing circular or hyperbolic rotation operations. Three categories of algorithm are surveyed: linear transformations, digital filters, and matrix-based DSP algorithms. >

492 citations

Proceedings ArticleDOI
29 Sep 1998
TL;DR: By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized and the area/power efficiency has been enhanced.
Abstract: The FFT processor is one of the key components in the implementation of wideband OFDM systems. Architectures with a structured pipeline have been used to meet the fast, real-time processing demand and low-power consumption requirement in a mobile environment. Architectures based on new forms of FFT, the radix-2/sup i/ algorithm derived by cascade decomposition, is proposed. By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized. Progressive wordlength adjustment has been introduced to optimize the total memory size with a given signal-to-quantization-noise-ratio (SQNR) requirement in fixed-point processing. A new complex multiplier based on distributed arithmetic further enhanced the area/power efficiency of the design. A single-chip processor for 1 K complex point FFT transform is used to demonstrate the design issues under consideration.

322 citations